[ https://issues.apache.org/jira/browse/ATLAS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Mestry updated ATLAS-3762: ----------------------------------- Attachment: ATLAS-3762-Edge-fetch-improvement-gremlin.patch > Entity Creation: Improve Edges Fetch Between Vertices > ----------------------------------------------------- > > Key: ATLAS-3762 > URL: https://issues.apache.org/jira/browse/ATLAS-3762 > Project: Atlas > Issue Type: Improvement > Reporter: Ashutosh Mestry > Assignee: Ashutosh Mestry > Priority: Major > Attachments: ATLAS-3762-Edge-fetch-improvement-gremlin.patch, > ATLAS-3762-Improve-Edge-creator-using-Genuine-iterat.patch > > > *Background* > One of the earlier commits replaced vertices and edges fetch with > _StreamSupport.stream_. This uses _Collect(toList),_ which causes all > contents to be fetched. > Using this causes large amount of data to be fetched. > *Solution* > Switch to iterators that will use lazy loading. > *Edge Fetch Refactoring* > Change the _getEdge_ to iterate on smaller dataset. > Here are the scenarios: > - _fromVertex_ is _hive_table_, _toVertex_ is _hive_column_. This means that > outgoing edges from _fromVertex_ will be many more than incoming edges to > _toVertex_. > - _fromVertex_ is _hive_process_execution_, _toVertex_ is _hive_table_. This > means that outgoing edges from _fromVertex_ will be fewer than incoming edges > _hive_table_. > Approach: > * Search it is a linear search, it will be more efficient to iterate over > fewer items than more items. > * Fetch count edge items for _fromVertex_ and _toVertex_. If either of the > count is 0, return NULL, since it will not result in anything being found. > * If either of the counts is not 0, take the one with fewer elements and > perform a search. > [~sidharthkmishra] Thanks for this simple but effective fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)