[ https://issues.apache.org/jira/browse/ATLAS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Mestry updated ATLAS-3762: ----------------------------------- Description: *Background* One of the earlier commits replaced vertices and edges fetch with _StreamSupport.stream_. This uses _Collect(toList),_ which causes all contents to be fetched. Using this causes large amount of data to be fetched. *Solution* Switch to iterators that will use lazy loading. *Edge Fetch Refactoring* Change the _getEdge_ to iterate on smaller dataset. Here are the scenarios: - _fromVertex_ is _hive_table_, _toVertex_ is _hive_column_. This means that outgoing edges from _fromVertex_ will be many more than incoming edges to _toVertex_. - _fromVertex_ is _hive_process_execution_, _toVertex_ is _hive_table_. This means that outgoing edges from _fromVertex_ will be fewer than incoming edges _hive_table_. Approach: * Search it is a linear search, it will be more efficient to iterate over fewer items than more items. * Fetch count edge items for _fromVertex_ and _toVertex_. If either of the count is 0, return NULL, since it will not result in anything being found. * If either of the counts is not 0, take the one with fewer elements and perform a search. [~sidharthkmishra] Thanks for this simple but effective fix. was: *Background* One of the earlier commits replaced vertices and edges fetch with _StreamSupport.stream_. This uses _Collect(toList),_ which causes all contents to be fetched. Using this causes large amount of data to be fetched. *Solution* Switch to iterators that will use lazy loading. *Minor Refactoring* Change the _getEdge_ to iterate on smaller dataset. [~sidharthkmishra] Thanks for this simple but effective fix. Summary: Entity Creation: Improve Edges Fetch Between Vertices (was: Entity Creation: Improve Vertices and Edges Fetch Using Genuine Iterators) > Entity Creation: Improve Edges Fetch Between Vertices > ----------------------------------------------------- > > Key: ATLAS-3762 > URL: https://issues.apache.org/jira/browse/ATLAS-3762 > Project: Atlas > Issue Type: Improvement > Reporter: Ashutosh Mestry > Assignee: Ashutosh Mestry > Priority: Major > Attachments: > ATLAS-3762-Improve-Edge-creator-using-Genuine-iterat.patch > > > *Background* > One of the earlier commits replaced vertices and edges fetch with > _StreamSupport.stream_. This uses _Collect(toList),_ which causes all > contents to be fetched. > Using this causes large amount of data to be fetched. > *Solution* > Switch to iterators that will use lazy loading. > *Edge Fetch Refactoring* > Change the _getEdge_ to iterate on smaller dataset. > Here are the scenarios: > - _fromVertex_ is _hive_table_, _toVertex_ is _hive_column_. This means that > outgoing edges from _fromVertex_ will be many more than incoming edges to > _toVertex_. > - _fromVertex_ is _hive_process_execution_, _toVertex_ is _hive_table_. This > means that outgoing edges from _fromVertex_ will be fewer than incoming edges > _hive_table_. > Approach: > * Search it is a linear search, it will be more efficient to iterate over > fewer items than more items. > * Fetch count edge items for _fromVertex_ and _toVertex_. If either of the > count is 0, return NULL, since it will not result in anything being found. > * If either of the counts is not 0, take the one with fewer elements and > perform a search. > [~sidharthkmishra] Thanks for this simple but effective fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)