[ https://issues.apache.org/jira/browse/ATLAS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049486#comment-16049486 ]
Christian R edited comment on ATLAS-1868 at 6/14/17 6:28 PM: ------------------------------------------------------------- Hi, one certainly needs a better understanding of gremlin than me to come up with a general optimizing strategy. In this simple case there is one property Atlas knows to be unique (id) and it could make sense to base the query on that. In the more general case of has(typename, a).relationIn(typename,b) , my intuition is that it is better to start with the typename that has the fewest entities and flip the relationship direction if necessary, but I do not know if that is actually the case in general. lucene can tell you very fast if there are more typename-a than typename-b entities (atlas can very well keep an internal statistic on this as well) https://github.com/apache/incubator-atlas/blob/19d344cd87eeee19b011f181f88ee5696470138d/repository/src/main/java/org/apache/atlas/gremlin/optimizer/GremlinQueryOptimizer.java sounds like a good place to start :) edit: the matrix chain multiplication algorithm keeps popping up in the back of my head. As if it would be best to solve parts of the query in a way that generates as few nodes as possible and then base new queries those small result sets. I doubt it would work, but maybe it gives you some ideas. was (Author: christianmr): Hi, one certainly needs a better understanding of gremlin than me to come up with a general optimizing strategy. In this simple case there is one property Atlas knows to be unique (id) and it could make sense to base the query on that. In the more general case of has(typename, a).relationIn(typename,b) , my intuition is that it is better to start with the typename that has the fewest entities and flip the relationship direction if necessary, but I do not know if that is actually the case in general. lucene can tell you very fast if there are more typename-a than typename-b entities (atlas can very well keep an internal statistic on this as well) https://github.com/apache/incubator-atlas/blob/19d344cd87eeee19b011f181f88ee5696470138d/repository/src/main/java/org/apache/atlas/gremlin/optimizer/GremlinQueryOptimizer.java sounds like a good place to start :) > Highly inefficient DSL-queries > ------------------------------ > > Key: ATLAS-1868 > URL: https://issues.apache.org/jira/browse/ATLAS-1868 > Project: Atlas > Issue Type: Bug > Components: atlas-core > Affects Versions: 0.7-incubating > Environment: linux, hbase + solr configuration. > Reporter: Christian R > Labels: dsl, gremlin > > The DSL query 'mytype where property.id = "id1"' appears to be rewritten as a > gremlin query that resembles: > g.V.has(typename, 'mytype'ยจ).as(x).out('property').has('id', 'id1').back('x') > On our system this query takes 6-7 minutes. The query > g.V.has('id', 'id1').in('property').has('typename', 'mytype') > takes 350 milliseconds. > Our graph: > g.V.count() = 1359151 > We have atlas 0.7 installed. I've compiled the latest 0.9 code and looked at > the generated gremlin query as reported in the logs for the same DSL-query, > and I think 0.9 has the same performance issues. Unfortunately I don't have a > big graph on a 0.9 installation to test performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)