Hi Rodjer, thank you for your insights! please see comments below:
Il giorno lunedì 13 ottobre 2014 18:37:50 UTC+2, Rodger ha scritto: > > Hello, > > I've done a lot of RDBMS performance tuning. > Just a few quick thoughts. > > > Be sure to run the queries in the shell, if you are not already doing so. > > Yes, they are run in the shell: http://localhost:7474/webadmin/#/console/ > How many rows are returned? Just sorting, then returning many rows, > takes a long time to scroll them to output. > > > 9 rows In the answer above, I wrote 9 paths > > If you are getting duplicates, it may be the equivalent of a cartesian > product, > one of the worst things that can happen in RDBMS, and also one > of the least known. See my presentation on them here: > > http://rodgersnotes.wordpress.com/2010/09/15/stamping-out-cartesian-products/ > <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2010%2F09%2F15%2Fstamping-out-cartesian-products%2F&sa=D&sntz=1&usg=AFQjCNHJDOJ0IOsI6XRsg_9yuTscI4mqtQ> > So I had a look at your pdf, http://rodgersnotes.files.wordpress.com/2010/09/cartprodwordpress.pdf page 11 and I think the idea you want to suggest, is to avoid duplicates (you called them 'cartesian products') by enforcing conditions. Though, since it is a graph db and not relational, not clear to me where this applies because in the graph db I don't have 'jointed' queries between tables, so the conditions I have are, at least in my case, properties (index on properties), and no-directional rels. > > > Try: > > return p, count (*) > order by count(*) > I run: profile MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name = 'Topic1' and m.name = 'Topic2' with p, n, m return p, count(*) order by count(*); and I've got: (see there are also duplicates in paths: is it because I have both (a)-[]->(b) and (a)<-[]-(b) ?) ==> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ==> | p | count(*) | ==> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[71185298]{proximity:68},Node[1401899]{id:21375850,name:"Topic3"},:P_Topic_Link[71185313]{proximity:32},Node[1386672]{id:21245,name:"Topic2"}] | 1 | ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[88675719]{proximity:28},Node[2594397]{id:31760062,name:"Topic4"},:P_Topic_Link[88675745]{proximity:23},Node[1386672]{id:21245,name:"Topic2"}] | 1 | ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[30736000]{proximity:32},Node[2515502]{id:3106745,name:"Topic5"},:P_Topic_Link[30735974]{proximity:82},Node[1386672]{id:21245,name:"Topic2"}] | 1 | ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[68206383]{proximity:72},Node[1202629]{id:19635605,name:"Topic6"},:P_Topic_Link[68206440]{proximity:32},Node[1386672]{id:21245,name:"Topic2"}] | 1 | ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[98898173]{proximity:23},Node[3329750]{id:38567205,name:"Topic7"},:P_Topic_Link[98898126]{proximity:124},Node[1386672]{id:21245,name:"Topic2"}] | 1 | ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[58107755]{proximity:55},Node[506613]{id:13841207,name:"Topic8"},:P_Topic_Link[58107766]{proximity:27},Node[1386672]{id:21245,name:"Topic2"}] | 1 | ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[98898173]{proximity:23},Node[3329750]{id:38567205,name:"Topic7"},:P_Topic_Link[1025873]{proximity:124},Node[1386672]{id:21245,name:"Topic2"}] | 1 | ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[5662626]{proximity:47},Node[736816]{id:157427,name:"Topic9"},:P_Topic_Link[5662565]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}] | 1 | ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[5662626]{proximity:47},Node[736816]{id:157427,name:"Topic9"},:P_Topic_Link[1025864]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}] | 1 | ==> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ==> 9 rows ==> ==> ColumnFilter(0) ==> | ==> +Sort ==> | ==> +EagerAggregation ==> | ==> +ColumnFilter(1) ==> | ==> +ExtractPath ==> | ==> +Filter ==> | ==> +TraversalMatcher ==> ==> +------------------+---------+---------+-------------+----------------------------------------------------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +------------------+---------+---------+-------------+----------------------------------------------------------------------------------+ ==> | ColumnFilter(0) | 9 | 0 | | keep columns p, count(*) | ==> | Sort | 9 | 0 | | Cached( INTERNAL_AGGREGATE931614f3-4def-4fc4-a80b-c6fca3839817 of type Integer) | ==> | EagerAggregation | 9 | 0 | | p | ==> | ColumnFilter(1) | 9 | 0 | | keep columns p, n, m | ==> | ExtractPath | 9 | 0 | p | | ==> | Filter | 9 | 3032385 | | (hasLabel(m:Topic(0)) AND Property(m,name(1)) == { AUTOSTRING1}) | ==> | TraversalMatcher | 1010795 | 1024307 | | m, UNNAMED36, m | ==> +------------------+---------+---------+-------------+----------------------------------------------------------------------------------+ ==> > > > > Without me looking at the raw data, and the query result, you > seem to have many operations going on. So, you have a lot of rows in > the profile output. > Only 9 > As a general rule, the more rows there are in the > profile, the slower the response time is. > ie. the more complex the query, the slower it is. > > > If I were looking at this, I would try to isolate which part of > the query is the slow part. The Return clause, or the Match clause? > > > You've already tried the response times with the data. > Try to simply: > return count(*) . > I run: MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name = 'Topic1' and m.name = 'Topic2' with p, n, m return p, count(*) order by count(*); and obtain 9 rows in 182799 ms I run: MATCH (n:Topic), (m:Topic) where n.name = 'Topic1' and m.name = 'Topic2' with n, m return count(*); and obtain 856ms profile MATCH (n:Topic), (m:Topic) where n.name = 'Topic1' and m.name = 'Topic2' with n, m return count(*); results in: ==> ColumnFilter ==> | ==> +EagerAggregation ==> | ==> +SchemaIndex(0) ==> | ==> +SchemaIndex(1) ==> ==> +------------------+------+--------+-------------+-------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +------------------+------+--------+-------------+-------------------------------+ ==> | ColumnFilter | 1 | 0 | | keep columns count(*) | ==> | EagerAggregation | 1 | 0 | | | ==> | SchemaIndex(0) | 1 | 2 | m, m | { AUTOSTRING1}; :Topic(name) | ==> | SchemaIndex(1) | 1 | 2 | n, n | { AUTOSTRING0}; :Topic(name) | ==> +------------------+------+--------+-------------+-------------------------------+ > How many seconds response time is that, versus the original query? > What is the resulting profile? > > > So, it looks like it actually take huge time in traversing the graph, while reasonable time '~900ms' to match a fullstring node. *Any idea for improving performance of traversal??* *It is a real problem, since also for getting results of first neighbors of a node, I met the same problem which makes currently unfeasible for production :* *Anyone with real case of similar size graph and structure trying to perform a similar query?* as example, this query to obtain first neighbors of node Topic44: MATCH (n:Topic) , (m), p = (n)-[*0..1]-(m) where n.name = 'Topic44' with p, n, m return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity) AS pathProximity order by pathProximity DESC LIMIT 6 returns 6 rows in ~65000 ms VS 6 rows in less than a second with a NoSQL. Any idea? thank you guys for helping!! Hope to find a solution soon.. > > > See also the tuning presentations I've done: > http://rodgersnotes.wordpress.com/2010/09/14/oracle-performance-tuning/ > <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2010%2F09%2F14%2Foracle-performance-tuning%2F&sa=D&sntz=1&usg=AFQjCNE0XK_XcNk5YBj806h6a1OJHr0glA> > > http://rodgersnotes.wordpress.com/2014/06/08/tuning-the-untunable-when-indexes-and-optimizer-dont-help-2/ > > <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2014%2F06%2F08%2Ftuning-the-untunable-when-indexes-and-optimizer-dont-help-2%2F&sa=D&sntz=1&usg=AFQjCNFgTfu5bnjPw6boHWttJpzQBtaNgw> > They are quick reads. > > thank you, seen them, they are about SQL tuning mostly: I've just used neo4j strucutre to store a graph with same label on 4M topics (I MUST keep it with one label), index on topic(name) property and used cypher to query the db, this is my data structure. I've put a number of principles and principles in there, that you might > apply. > ie. Could you create the NEO4J equivalent of a temp table? > > > Hope this helps. > > > On Thursday, October 9, 2014 2:41:47 AM UTC-5, gg4u wrote: >> >> Hi Micheal, thank you. >> sure I post my profile result here below ! >> >> >>> -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
