----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55813/#review162981 -----------------------------------------------------------
Ship it! The fix looks good! With this patch, following DSL query returns in about 300ms, compared to about 50 seconds earlier! On a store having ~70,000 hive_columns hive_column where qualifiedName='default.testtable_772.col507@cl1' - Madhan Neethiraj On Jan. 25, 2017, 9:32 a.m., Sarath Subramanian wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55813/ > ----------------------------------------------------------- > > (Updated Jan. 25, 2017, 9:32 a.m.) > > > Review request for atlas, Madhan Neethiraj and Suma Shivaprasad. > > > Bugs: ATLAS-1403 > https://issues.apache.org/jira/browse/ATLAS-1403 > > > Repository: atlas > > > Description > ------- > > Currently DSL uses a fill function during Gremlin Translation to merge > results by typeName and superTypeName and fill function loads the resulting > vertices in memory. This causes significant memory usage and ATLAS server > spends lot of time doing GC instead of useful work resulting in OOO sometimes > ( when GC is not able to recover and search queries are run in parallel) > The proposal is to replace this with typeName checks along by finding all the > subtypes for a given type and using an IN clause in the filter. > For eg: > Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0 > Optimized query > Gremlin Query = L: > {g.V.has("__typeName", T.in, > ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) > [0..<40].toList()} > > > Diffs > ----- > > > repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java > fd5dba7 > > repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java > 266f27c > > repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java > b637f90 > > repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java > 41dc65f > > repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java > 3677544 > repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java > 889236c > repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 > > repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala > a9dcdff > repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala > ade4176 > repository/src/main/scala/org/apache/atlas/query/GremlinQuery.scala a61ff98 > > repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java > a0ee26c > repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 > > Diff: https://reviews.apache.org/r/55813/diff/ > > > Testing > ------- > > Ran all Unit Tests and was successful. > Ran search query on hive_column with 100,000 entities, performance improved > from 45sec to 0.5sec > > > Thanks, > > Sarath Subramanian > >
