Re: Review Request 55813: Porting performance and stability changes made in 0.7 branch into master

Madhan Neethiraj Wed, 25 Jan 2017 10:06:10 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/#review162981
-----------------------------------------------------------



Ship it!




The fix looks good!

With this patch, following DSL query returns in about 300ms, compared to about 
50 seconds earlier! On a store having ~70,000 hive_columns

  hive_column where qualifiedName='default.testtable_772.col507@cl1'

- Madhan Neethiraj


On Jan. 25, 2017, 9:32 a.m., Sarath Subramanian wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55813/
> -----------------------------------------------------------
> 
> (Updated Jan. 25, 2017, 9:32 a.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
> 
> 
> Bugs: ATLAS-1403
>     https://issues.apache.org/jira/browse/ATLAS-1403
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Currently DSL uses a fill function during Gremlin Translation to merge 
> results by typeName and superTypeName and fill function loads the resulting 
> vertices in memory. This causes significant memory usage and ATLAS server 
> spends lot of time doing GC instead of useful work resulting in OOO sometimes 
> ( when GC is not able to recover and search queries are run in parallel)
> The proposal is to replace this with typeName checks along by finding all the 
> subtypes for a given type and using an IN clause in the filter.
> For eg:
> Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
> Optimized query
> Gremlin Query = L:
> {g.V.has("__typeName", T.in, 
> ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) 
> [0..<40].toList()}
> 
> 
> Diffs
> -----
> 
>   
> repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java
>  fd5dba7 
>   
> repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java
>  266f27c 
>   
> repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java
>  b637f90 
>   
> repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java
>  41dc65f 
>   
> repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java
>  3677544 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
> 889236c 
>   repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 
>   
> repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala
>  a9dcdff 
>   repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala 
> ade4176 
>   repository/src/main/scala/org/apache/atlas/query/GremlinQuery.scala a61ff98 
>   
> repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java
>  a0ee26c 
>   repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 
> 
> Diff: https://reviews.apache.org/r/55813/diff/
> 
> 
> Testing
> -------
> 
> Ran all Unit Tests and was successful.
> Ran search query on hive_column with 100,000 entities, performance improved 
> from 45sec to 0.5sec
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>

Re: Review Request 55813: Porting performance and stability changes made in 0.7 branch into master

Reply via email to