-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72287/#review220181
-----------------------------------------------------------


Fix it, then Ship it!





repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java
Lines 344 (patched)
<https://reviews.apache.org/r/72287/#comment308499>

    edgeLabel is typicallu used to find subset of edges from a given vertex. 
Having an edge-index on the label probably won't help improve the performance; 
however, need to understand the impact of creating this index in an existing 
Atlas instance having large number of edges. 1) Would index be populated with 
existing edge labels? 2) If yes, how long would the index creation take - say 
for 1m edges? 3) If no, would search ignore edges that were not indexd?
    
    I suggest to find the performace impact of not having this index.


- Madhan Neethiraj


On March 30, 2020, 11:19 p.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72287/
> -----------------------------------------------------------
> 
> (Updated March 30, 2020, 11:19 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, 
> and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3706
>     https://issues.apache.org/jira/browse/ATLAS-3706
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Approach**
> 
> 1. Added Metrics to most of the methods in entity creation. (The patch does 
> not include the additional metrics added to additional places.)
> 2. Started importing large number of entities using the 
> _ZipFileMigrationImporter_.
> 3. Observed behavior of import over 24 hours. Observations included CPU 
> usage, memory usage and the import throughput using the _metric.log_.
> 4. Changes were added to the one at a time. Impact of the change was observed 
> for performance (via metric.log) and accuracy before next change was added.
> 
> **Observations**
> * Relationship creation took inordinately large amount of time under load. 
> The time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This 
> implementation also caused memory build up of _AtlasEdge_ objects which 
> stayed in memory for long time. This had the secondary effect of slowing down 
> entity creation operations after about 6 hours (this duration differed with 
> node configuration).
> * _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is 
> time consuming.
> * _GraphBackedSearchIndexer_ edge label index. Majority of edge creation 
> operation included lookup by edge label.
> 
> **Configuration**
> Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space.
> Atlas configuration: 32 GB RAM.
> 
> 
> Diffs
> -----
> 
>   
> repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java
>  647e3040c 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
> 5ab9f4d13 
> 
> 
> Diff: https://reviews.apache.org/r/72287/diff/1/
> 
> 
> Testing
> -------
> 
> **Manual tests**
> (See above).
> Accuracy verification.
> 
> **Unit tests**
> Executed existing unit tests.
> 
> **Pre-commit build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>

Reply via email to