How many rows does this return? MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name = 'Topic1' and m.name = 'Topic2' with p, n, m return p, count(*) order by count(*);
your aggregation was only on the same paths, so you get 9 different paths but you didn't show the counts per path. and obtain 9 rows in 182799 ms On Tue, Oct 14, 2014 at 10:59 AM, gg4u <[email protected]> wrote: > Yes: > > neo4j-sh (?)$ profile MATCH (n:Topic), (m:Topic) where n.name = 'Topic1' > and m.name = 'Topic2' MATCH p = (n)-[*0..2]-(m) return p, > reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity) > AS pathProximity order by pathProximity DESC LIMIT 6; > ==> > [...results...] > ==> 6 rows > ==> > ==> ColumnFilter > ==> | > ==> +Top > ==> | > ==> +Extract > ==> | > ==> +ExtractPath > ==> | > ==> +PatternMatcher > ==> | > ==> +SchemaIndex(0) > ==> | > ==> +SchemaIndex(1) > ==> > ==> > +----------------+------+--------+-------------------+-------------------------------------------------+ > ==> | Operator | Rows | DbHits | Identifiers | > Other | > ==> > +----------------+------+--------+-------------------+-------------------------------------------------+ > ==> | ColumnFilter | 6 | 0 | | > keep columns p, pathProximity | > ==> | Top | 6 | 0 | | { AUTOINT3}; > Cached(pathProximity of type Any) | > ==> | Extract | 9 | 36 | | > pathProximity | > ==> | ExtractPath | 9 | 0 | p | > | > ==> | PatternMatcher | 9 | 0 | n, m, UNNAMED94 | > | > ==> | SchemaIndex(0) | 1 | 2 | m, m | > { AUTOSTRING1}; :Topic(name) | > ==> | SchemaIndex(1) | 1 | 2 | n, n | > { AUTOSTRING0}; :Topic(name) | > ==> > +----------------+------+--------+-------------------+-------------------------------------------------+ > ==> > neo4j-sh (?)$ > > > > Il giorno martedì 14 ottobre 2014 10:00:29 UTC+2, Michael Hunger ha > scritto: >> >> Can you try this: >> >> profile >> MATCH (n:Topic), (m:Topic) >> where n.name = 'Topic1' and m.name = 'Topic2' >> MATCH p = (n)-[*0..2]-(m) >> return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + >> n.proximity) AS pathProximity >> order by pathProximity DESC >> LIMIT 6 >> >> >> >> On Tue, Oct 14, 2014 at 9:06 AM, gg4u <[email protected]> wrote: >> >>> Hi Rodjer, >>> >>> thank you for your insights! >>> please see comments below: >>> >>> Il giorno lunedì 13 ottobre 2014 18:37:50 UTC+2, Rodger ha scritto: >>>> >>>> Hello, >>>> >>>> I've done a lot of RDBMS performance tuning. >>>> Just a few quick thoughts. >>>> >>>> >>>> Be sure to run the queries in the shell, if you are not already doing >>>> so. >>>> >>>> >>> Yes, they are run in the shell: >>> http://localhost:7474/webadmin/#/console/ >>> >>> >>>> How many rows are returned? Just sorting, then returning many rows, >>>> takes a long time to scroll them to output. >>>> >>>> >>>> >>> 9 rows >>> In the answer above, I wrote 9 paths >>> >>> >>> >>>> >>>> If you are getting duplicates, it may be the equivalent of a cartesian >>>> product, >>>> one of the worst things that can happen in RDBMS, and also one >>>> of the least known. See my presentation on them here: >>>> http://rodgersnotes.wordpress.com/2010/09/15/stamping-out-ca >>>> rtesian-products/ >>>> <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2010%2F09%2F15%2Fstamping-out-cartesian-products%2F&sa=D&sntz=1&usg=AFQjCNHJDOJ0IOsI6XRsg_9yuTscI4mqtQ> >>>> >>> >>> So I had a look at your pdf, >>> http://rodgersnotes.files.wordpress.com/2010/09/cartprodwordpress.pdf >>> page 11 >>> >>> and I think the idea you want to suggest, is to avoid duplicates (you >>> called them 'cartesian products') by enforcing conditions. >>> Though, since it is a graph db and not relational, not clear to me where >>> this applies because in the graph db I don't have 'jointed' queries between >>> tables, >>> so the conditions I have are, at least in my case, properties (index on >>> properties), and no-directional rels. >>> >>> >>>> >>>> >>>> Try: >>>> >>>> return p, count (*) >>>> order by count(*) >>>> >>> >>> I run: >>> >>> profile MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name = >>> 'Topic1' and m.name = 'Topic2' with p, n, m return p, count(*) order by >>> count(*); >>> >>> and I've got: (see there are also duplicates in paths: is it because I >>> have both (a)-[]->(b) and (a)<-[]-(b) ?) >>> >>> ==> +----------------------------------------------------------- >>> ------------------------------------------------------------ >>> ------------------------------------------------------------ >>> ---------------------------------------------------------------------+ >>> ==> | p >>> >>> >>> | count(*) | >>> ==> +----------------------------------------------------------- >>> ------------------------------------------------------------ >>> ------------------------------------------------------------ >>> ---------------------------------------------------------------------+ >>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[ >>> 71185298]{proximity:68},Node[1401899]{id:21375850,name:" >>> Topic3"},:P_Topic_Link[71185313]{proximity:32},Node[ >>> 1386672]{id:21245,name:"Topic2"}] | 1 | >>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[ >>> 88675719]{proximity:28},Node[2594397]{id:31760062,name:" >>> Topic4"},:P_Topic_Link[88675745]{proximity:23},Node[ >>> 1386672]{id:21245,name:"Topic2"}] | 1 | >>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[ >>> 30736000]{proximity:32},Node[2515502]{id:3106745,name:" >>> Topic5"},:P_Topic_Link[30735974]{proximity:82},Node[ >>> 1386672]{id:21245,name:"Topic2"}] | 1 | >>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[ >>> 68206383]{proximity:72},Node[1202629]{id:19635605,name:" >>> Topic6"},:P_Topic_Link[68206440]{proximity:32},Node[ >>> 1386672]{id:21245,name:"Topic2"}] | 1 | >>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[ >>> 98898173]{proximity:23},Node[3329750]{id:38567205,name:" >>> Topic7"},:P_Topic_Link[98898126]{proximity:124},Node[ >>> 1386672]{id:21245,name:"Topic2"}] | 1 | >>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[ >>> 58107755]{proximity:55},Node[506613]{id:13841207,name:" >>> Topic8"},:P_Topic_Link[58107766]{proximity:27},Node[ >>> 1386672]{id:21245,name:"Topic2"}] | 1 >>> | >>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[ >>> 98898173]{proximity:23},Node[3329750]{id:38567205,name:" >>> Topic7"},:P_Topic_Link[1025873]{proximity:124},Node[ >>> 1386672]{id:21245,name:"Topic2"}] | 1 | >>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[ >>> 5662626]{proximity:47},Node[736816]{id:157427,name:" >>> Topic9"},:P_Topic_Link[5662565]{proximity:138},Node[ >>> 1386672]{id:21245,name:"Topic2"}] | 1 | >>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[ >>> 5662626]{proximity:47},Node[736816]{id:157427,name:" >>> Topic9"},:P_Topic_Link[1025864]{proximity:138},Node[ >>> 1386672]{id:21245,name:"Topic2"}] | 1 | >>> ==> +----------------------------------------------------------- >>> ------------------------------------------------------------ >>> ------------------------------------------------------------ >>> ---------------------------------------------------------------------+ >>> ==> 9 rows >>> ==> >>> ==> ColumnFilter(0) >>> ==> | >>> ==> +Sort >>> ==> | >>> ==> +EagerAggregation >>> ==> | >>> ==> +ColumnFilter(1) >>> ==> | >>> ==> +ExtractPath >>> ==> | >>> ==> +Filter >>> ==> | >>> ==> +TraversalMatcher >>> ==> >>> ==> +------------------+---------+---------+-------------+------ >>> ------------------------------------------------------------ >>> ----------------+ >>> ==> | Operator | Rows | DbHits | Identifiers | >>> Other | >>> ==> +------------------+---------+---------+-------------+------ >>> ------------------------------------------------------------ >>> ----------------+ >>> ==> | ColumnFilter(0) | 9 | 0 | | >>> keep columns p, count(*) | >>> ==> | Sort | 9 | 0 | | Cached( >>> INTERNAL_AGGREGATE931614f3-4def-4fc4-a80b-c6fca3839817 of type >>> Integer) | >>> ==> | EagerAggregation | 9 | 0 | | >>> p | >>> ==> | ColumnFilter(1) | 9 | 0 | | >>> keep columns p, n, m | >>> ==> | ExtractPath | 9 | 0 | p | >>> | >>> ==> | Filter | 9 | 3032385 | | >>> (hasLabel(m:Topic(0)) AND Property(m,name(1)) == { AUTOSTRING1}) | >>> ==> | TraversalMatcher | 1010795 | 1024307 | | >>> m, UNNAMED36, m | >>> ==> +------------------+---------+---------+-------------+------ >>> ------------------------------------------------------------ >>> ----------------+ >>> ==> >>> >>>> >>>> >>>> >>>> Without me looking at the raw data, and the query result, you >>>> seem to have many operations going on. So, you have a lot of rows in >>>> the profile output. >>>> >>> >>> Only 9 >>> >>> >>>> As a general rule, the more rows there are in the >>>> profile, the slower the response time is. >>>> ie. the more complex the query, the slower it is. >>>> >>>> >>>> If I were looking at this, I would try to isolate which part of >>>> the query is the slow part. The Return clause, or the Match clause? >>>> >>>> >>>> You've already tried the response times with the data. >>>> Try to simply: >>>> return count(*) . >>>> >>> >>> I run: >>> MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name = >>> 'Topic1' and m.name = 'Topic2' with p, n, m return p, count(*) order by >>> count(*); >>> >>> and obtain 9 rows in 182799 ms >>> >>> I run: >>> MATCH (n:Topic), (m:Topic) where n.name = 'Topic1' and m.name = >>> 'Topic2' with n, m return count(*); >>> >>> and obtain 856ms >>> >>> >>> profile MATCH (n:Topic), (m:Topic) where n.name = 'Topic1' and m.name = >>> 'Topic2' with n, m return count(*); >>> >>> results in: >>> >>> >>> ==> ColumnFilter >>> ==> | >>> ==> +EagerAggregation >>> ==> | >>> ==> +SchemaIndex(0) >>> ==> | >>> ==> +SchemaIndex(1) >>> ==> >>> ==> +------------------+------+--------+-------------+---------- >>> ---------------------+ >>> ==> | Operator | Rows | DbHits | Identifiers | >>> Other | >>> ==> +------------------+------+--------+-------------+---------- >>> ---------------------+ >>> ==> | ColumnFilter | 1 | 0 | | keep >>> columns count(*) | >>> ==> | EagerAggregation | 1 | 0 | | >>> | >>> ==> | SchemaIndex(0) | 1 | 2 | m, m | { AUTOSTRING1}; >>> :Topic(name) | >>> ==> | SchemaIndex(1) | 1 | 2 | n, n | { AUTOSTRING0}; >>> :Topic(name) | >>> ==> +------------------+------+--------+-------------+---------- >>> ---------------------+ >>> >>> >>>> How many seconds response time is that, versus the original query? >>>> What is the resulting profile? >>>> >>>> >>>> >>> >>> So, it looks like it actually take huge time in traversing the graph, >>> while reasonable time '~900ms' to match a fullstring node. >>> >>> *Any idea for improving performance of traversal??* >>> >>> *It is a real problem, since also for getting results of first neighbors >>> of a node, I met the same problem which makes currently unfeasible for >>> production :* >>> *Anyone with real case of similar size graph and structure trying to >>> perform a similar query?* >>> >>> as example, this query to obtain first neighbors of node Topic44: >>> >>> MATCH (n:Topic) , (m), p = (n)-[*0..1]-(m) >>> where n.name = 'Topic44' >>> with p, n, m >>> return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + >>> n.proximity) AS pathProximity order by pathProximity DESC LIMIT 6 >>> >>> returns >>> 6 rows in ~65000 ms VS 6 rows in less than a second with a NoSQL. >>> >>> Any idea? >>> >>> thank you guys for helping!! Hope to find a solution soon.. >>> >>> >>> >>> >>>> >>>> >>>> See also the tuning presentations I've done: >>>> http://rodgersnotes.wordpress.com/2010/09/14/oracle-performance-tuning/ >>>> <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2010%2F09%2F14%2Foracle-performance-tuning%2F&sa=D&sntz=1&usg=AFQjCNE0XK_XcNk5YBj806h6a1OJHr0glA> >>>> http://rodgersnotes.wordpress.com/2014/06/08/tuning-the-untu >>>> nable-when-indexes-and-optimizer-dont-help-2/ >>>> <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2014%2F06%2F08%2Ftuning-the-untunable-when-indexes-and-optimizer-dont-help-2%2F&sa=D&sntz=1&usg=AFQjCNFgTfu5bnjPw6boHWttJpzQBtaNgw> >>>> They are quick reads. >>>> >>>> thank you, seen them, >>> they are about SQL tuning mostly: >>> I've just used neo4j strucutre to store a graph with same label on 4M >>> topics (I MUST keep it with one label), index on topic(name) property and >>> used cypher to query the db, >>> this is my data structure. >>> >>> I've put a number of principles and principles in there, that you might >>>> apply. >>>> ie. Could you create the NEO4J equivalent of a temp table? >>>> >>>> >>>> Hope this helps. >>>> >>>> >>>> On Thursday, October 9, 2014 2:41:47 AM UTC-5, gg4u wrote: >>>>> >>>>> Hi Micheal, thank you. >>>>> sure I post my profile result here below ! >>>>> >>>>> >>>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
