[jira] [Comment Edited] (ATLAS-1868) Highly inefficient DSL-queries

Christian R (JIRA) Wed, 14 Jun 2017 11:29:30 -0700

    [ 
https://issues.apache.org/jira/browse/ATLAS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049486#comment-16049486
 ]


Christian R edited comment on ATLAS-1868 at 6/14/17 6:28 PM:
-------------------------------------------------------------

Hi, 

one certainly needs a better understanding of gremlin than me to come up with a 
general optimizing strategy. In this simple case there is one property Atlas 
knows to be unique (id) and it could make sense to base the query on that. In 
the more general case of has(typename, a).relationIn(typename,b) , my intuition 
is that it is better to start with the typename that has the fewest entities 
and flip the relationship direction if necessary, but I do not know if that is 
actually the case in general. lucene can tell you very fast if there are more 
typename-a than typename-b entities (atlas can very well keep an internal 
statistic on this as well) 

https://github.com/apache/incubator-atlas/blob/19d344cd87eeee19b011f181f88ee5696470138d/repository/src/main/java/org/apache/atlas/gremlin/optimizer/GremlinQueryOptimizer.java
 sounds like a good place to start :) 


edit: the matrix chain multiplication algorithm keeps popping up in the back of 
my head. As if it would be best to solve parts of the query in a way that 
generates as few nodes as possible and then base new queries those small result 
sets. I doubt it would work, but maybe it gives you some ideas. 


was (Author: christianmr):
Hi, 

one certainly needs a better understanding of gremlin than me to come up with a 
general optimizing strategy. In this simple case there is one property Atlas 
knows to be unique (id) and it could make sense to base the query on that. In 
the more general case of has(typename, a).relationIn(typename,b) , my intuition 
is that it is better to start with the typename that has the fewest entities 
and flip the relationship direction if necessary, but I do not know if that is 
actually the case in general. lucene can tell you very fast if there are more 
typename-a than typename-b entities (atlas can very well keep an internal 
statistic on this as well) 

https://github.com/apache/incubator-atlas/blob/19d344cd87eeee19b011f181f88ee5696470138d/repository/src/main/java/org/apache/atlas/gremlin/optimizer/GremlinQueryOptimizer.java
 sounds like a good place to start :) 

> Highly inefficient DSL-queries
> ------------------------------
>
>                 Key: ATLAS-1868
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1868
>             Project: Atlas
>          Issue Type: Bug
>          Components:  atlas-core
>    Affects Versions: 0.7-incubating
>         Environment: linux, hbase + solr configuration.
>            Reporter: Christian R
>              Labels: dsl, gremlin
>
> The DSL query 'mytype where property.id = "id1"' appears to be rewritten as a 
> gremlin query that resembles:
> g.V.has(typename, 'mytype'¨).as(x).out('property').has('id', 'id1').back('x')
> On our system this query takes 6-7 minutes. The query
> g.V.has('id', 'id1').in('property').has('typename', 'mytype')
> takes 350 milliseconds.
> Our graph:
> g.V.count() = 1359151
> We have atlas 0.7 installed. I've compiled the latest 0.9 code and looked at 
> the generated gremlin query as reported in the logs for the same DSL-query, 
> and I think 0.9 has the same performance issues. Unfortunately I don't have a 
> big graph on a 0.9 installation to test performance. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (ATLAS-1868) Highly inefficient DSL-queries

Reply via email to