[
https://issues.apache.org/jira/browse/ATLAS-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashutosh Mestry updated ATLAS-3132:
-----------------------------------
Description:
*Background*
The Java patch framework (now called data patching framework) introduced
recently performs patching at the rate of 1 million entities per 15 hrs. This
can be improved.
*Proposed Solution*
* Use the Producer-Consumer framework to spawn multiple workers to perform
concurrent updates to entity vertices.
* Use _AtlasGraph_ in bulk loading mode to further gain performance.
* Perform duplicate data checks during processing.
*Projected Performance Improvement*
* Based on various tests, these give increased throughput. New rate can be
~300K entities per 5 mins.
was:
*Background*
The Java patch framework (now called data patching framework) introduced
recently performs patching at the rate of 1 million entities per 15 hrs. This
can be improved.
*Proposed Solution***
* Use the Producer-Consumer framework to spawn multiple workers to perform
concurrent updates to entity vertices.
* Use _AtlasGraph_ in bulk loading mode to further gain performance.
* Perform duplicate data checks during processing.
*Projected Performance Improvement*
* Based on various tests, these give increased throughput. New rate can be
~300K entities per 5 mins.
> Data Patch Fx: Improve Data Patching Performance
> ------------------------------------------------
>
> Key: ATLAS-3132
> URL: https://issues.apache.org/jira/browse/ATLAS-3132
> Project: Atlas
> Issue Type: Improvement
> Components: atlas-core
> Affects Versions: trunk
> Reporter: Ashutosh Mestry
> Assignee: Ashutosh Mestry
> Priority: Major
> Fix For: trunk
>
>
> *Background*
> The Java patch framework (now called data patching framework) introduced
> recently performs patching at the rate of 1 million entities per 15 hrs. This
> can be improved.
> *Proposed Solution*
> * Use the Producer-Consumer framework to spawn multiple workers to perform
> concurrent updates to entity vertices.
> * Use _AtlasGraph_ in bulk loading mode to further gain performance.
> * Perform duplicate data checks during processing.
> *Projected Performance Improvement*
> * Based on various tests, these give increased throughput. New rate can be
> ~300K entities per 5 mins.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)