Ashutosh Mestry created ATLAS-2434:
--------------------------------------
Summary: Import: Performance Improvement
Key: ATLAS-2434
URL: https://issues.apache.org/jira/browse/ATLAS-2434
Project: Atlas
Issue Type: Bug
Components: atlas-core
Affects Versions: trunk
Reporter: Ashutosh Mestry
Assignee: Ashutosh Mestry
Fix For: trunk
*Background*
The introduction of _relationships_ within Atlas, caused the
_EntityMutationResponse_ to contain many more entities as modified than before.
This has adverse impact on performance when it comes to bulk entity creation.
Entity creation in bulk happens during import process. Single entity creation.
*Behavior*
During import, in a typical scenario where database is being imported. The
_EntityMutationResponse_'s updated entities grows progressively. This happens
because every edge created between database-table and table-column is marked as
updated entity.
Import thus slows down progressively.
On a ZIP file used for benchmarks, showed:
* Branch-0.8 (last release): 2 minutes.
* Master (current development): 40+ minutes.
The behavior deteriorates as size of import increases.
*Possible Solution*
During import process, avoid marking entities affected due to relationship edge
creation as modified.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)