Re: [Neo4j] Traversing Large (weighted) graphs: performance, data structure, indexes

gg4u Tue, 14 Oct 2014 00:06:25 -0700

Hi Rodjer,

thank you for your insights!
please see comments below:


Il giorno lunedì 13 ottobre 2014 18:37:50 UTC+2, Rodger ha scritto:
>
> Hello,
>
> I've done a lot of RDBMS performance tuning.
> Just a few quick thoughts.
>
>
> Be sure to run the queries in the shell, if you are not already doing so.
>
>
Yes, they are run in the shell:
http://localhost:7474/webadmin/#/console/
 

> How many rows are returned? Just sorting, then returning many rows, 
> takes a long time to scroll them to output. 
>
>
>
9 rows
In the answer above, I wrote 9 paths

 

>
> If you are getting duplicates, it may be the equivalent of a cartesian 
> product, 
> one of the worst things that can happen in RDBMS, and also one
> of the least known. See my presentation on them here:
>
> http://rodgersnotes.wordpress.com/2010/09/15/stamping-out-cartesian-products/ 
> <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2010%2F09%2F15%2Fstamping-out-cartesian-products%2F&sa=D&sntz=1&usg=AFQjCNHJDOJ0IOsI6XRsg_9yuTscI4mqtQ>
>

So I had a look at your pdf,
http://rodgersnotes.files.wordpress.com/2010/09/cartprodwordpress.pdf
page 11

and I think the idea you want to suggest, is to avoid duplicates (you 
called them 'cartesian products') by enforcing conditions.
Though, since it is a graph db and not relational, not clear to me where 
this applies because in the graph db I don't have 'jointed' queries between 
tables,
so the conditions I have are, at least in my case, properties (index on 
properties), and no-directional rels.
 

>
>
> Try:
>
> return p, count (*) 
> order by count(*)
>

I run:

profile MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name = 
'Topic1' and m.name = 'Topic2' with p, n, m return p, count(*) order by 
count(*);

and I've got: (see there are also duplicates in paths: is it because I have 
both (a)-[]->(b) and (a)<-[]-(b) ?)

==> 
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | p                                                                     
                                                                            
                                                                            
              | count(*) |
==> 
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | 
[Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[71185298]{proximity:68},Node[1401899]{id:21375850,name:"Topic3"},:P_Topic_Link[71185313]{proximity:32},Node[1386672]{id:21245,name:"Topic2"}]
 
                  | 1        |
==> | 
[Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[88675719]{proximity:28},Node[2594397]{id:31760062,name:"Topic4"},:P_Topic_Link[88675745]{proximity:23},Node[1386672]{id:21245,name:"Topic2"}]
 
          | 1        |
==> | 
[Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[30736000]{proximity:32},Node[2515502]{id:3106745,name:"Topic5"},:P_Topic_Link[30735974]{proximity:82},Node[1386672]{id:21245,name:"Topic2"}]
 
| 1        |
==> | 
[Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[68206383]{proximity:72},Node[1202629]{id:19635605,name:"Topic6"},:P_Topic_Link[68206440]{proximity:32},Node[1386672]{id:21245,name:"Topic2"}]
 
             | 1        |
==> | 
[Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[98898173]{proximity:23},Node[3329750]{id:38567205,name:"Topic7"},:P_Topic_Link[98898126]{proximity:124},Node[1386672]{id:21245,name:"Topic2"}]
 
                       | 1        |
==> | 
[Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[58107755]{proximity:55},Node[506613]{id:13841207,name:"Topic8"},:P_Topic_Link[58107766]{proximity:27},Node[1386672]{id:21245,name:"Topic2"}]
 
                            | 1        |
==> | 
[Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[98898173]{proximity:23},Node[3329750]{id:38567205,name:"Topic7"},:P_Topic_Link[1025873]{proximity:124},Node[1386672]{id:21245,name:"Topic2"}]
 
                        | 1        |
==> | 
[Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[5662626]{proximity:47},Node[736816]{id:157427,name:"Topic9"},:P_Topic_Link[5662565]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}]
 
                 | 1        |
==> | 
[Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[5662626]{proximity:47},Node[736816]{id:157427,name:"Topic9"},:P_Topic_Link[1025864]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}]
 
                 | 1        |
==> 
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> 9 rows
==> 
==> ColumnFilter(0)
==>   |
==>   +Sort
==>     |
==>     +EagerAggregation
==>       |
==>       +ColumnFilter(1)
==>         |
==>         +ExtractPath
==>           |
==>           +Filter
==>             |
==>             +TraversalMatcher
==> 
==> 
+------------------+---------+---------+-------------+----------------------------------------------------------------------------------+
==> |         Operator |    Rows |  DbHits | Identifiers |                 
                                                           Other |
==> 
+------------------+---------+---------+-------------+----------------------------------------------------------------------------------+
==> |  ColumnFilter(0) |       9 |       0 |             |                 
                                        keep columns p, count(*) |
==> |             Sort |       9 |       0 |             | Cached( 
 INTERNAL_AGGREGATE931614f3-4def-4fc4-a80b-c6fca3839817 of type Integer) |
==> | EagerAggregation |       9 |       0 |             |                 
                                                               p |
==> |  ColumnFilter(1) |       9 |       0 |             |                 
                                            keep columns p, n, m |
==> |      ExtractPath |       9 |       0 |           p |                 
                                                                 |
==> |           Filter |       9 | 3032385 |             |               
 (hasLabel(m:Topic(0)) AND Property(m,name(1)) == {  AUTOSTRING1}) |
==> | TraversalMatcher | 1010795 | 1024307 |             |                 
                                               m,   UNNAMED36, m |
==> 
+------------------+---------+---------+-------------+----------------------------------------------------------------------------------+
==> 

>
>
>
> Without me looking at the raw data, and the query result, you
> seem to have many operations going on. So, you have a lot of rows in 
> the profile output. 
>

Only 9
 

>  As a general rule, the more rows there are in the 
> profile, the slower the response time is. 
> ie. the more complex the query, the slower it is. 
>
>
> If I were looking at this, I would try to isolate which part of 
> the query is the slow part.  The Return clause, or the Match clause?
>
>
> You've already tried the response times with the data.
> Try to simply: 
> return count(*) .
>

I run:
MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name = 'Topic1' 
and m.name = 'Topic2' with p, n, m return p, count(*) order by count(*);

and obtain 9 rows in 182799 ms

I run:
MATCH (n:Topic), (m:Topic) where n.name = 'Topic1' and m.name = 'Topic2' 
with n, m return count(*);

and obtain 856ms


profile MATCH (n:Topic), (m:Topic) where n.name = 'Topic1' and m.name = 
'Topic2' with n, m return count(*);

results in:


==> ColumnFilter
==>   |
==>   +EagerAggregation
==>     |
==>     +SchemaIndex(0)
==>       |
==>       +SchemaIndex(1)
==> 
==> 
+------------------+------+--------+-------------+-------------------------------+
==> |         Operator | Rows | DbHits | Identifiers |                     
    Other |
==> 
+------------------+------+--------+-------------+-------------------------------+
==> |     ColumnFilter |    1 |      0 |             |         keep columns 
count(*) |
==> | EagerAggregation |    1 |      0 |             |                     
          |
==> |   SchemaIndex(0) |    1 |      2 |        m, m | {  AUTOSTRING1}; 
:Topic(name) |
==> |   SchemaIndex(1) |    1 |      2 |        n, n | {  AUTOSTRING0}; 
:Topic(name) |
==> 
+------------------+------+--------+-------------+-------------------------------+
 

> How many seconds response time is that, versus the original query? 
> What is the resulting profile?
>
>
>

So, it looks like it actually take huge time in traversing the graph,
while reasonable time '~900ms' to match a fullstring node.

*Any idea for improving performance of traversal??*

*It is a real problem, since also for getting results of first neighbors of 
a node, I met the same problem which makes currently unfeasible for 
production :*
*Anyone with real case of similar size graph and structure trying to 
perform a similar query?*

as example, this query to obtain first neighbors of node Topic44:

MATCH (n:Topic) , (m), p = (n)-[*0..1]-(m)
where n.name = 'Topic44' 
with p, n, m
return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + 
n.proximity) AS pathProximity order by pathProximity DESC LIMIT 6

returns
6 rows in ~65000 ms VS 6 rows in less than a second with a NoSQL.

Any idea?

thank you guys for helping!! Hope to find a solution soon..




>
>
> See also the tuning presentations I've done: 
> http://rodgersnotes.wordpress.com/2010/09/14/oracle-performance-tuning/ 
> <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2010%2F09%2F14%2Foracle-performance-tuning%2F&sa=D&sntz=1&usg=AFQjCNE0XK_XcNk5YBj806h6a1OJHr0glA>
>
> http://rodgersnotes.wordpress.com/2014/06/08/tuning-the-untunable-when-indexes-and-optimizer-dont-help-2/
>  
> <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2014%2F06%2F08%2Ftuning-the-untunable-when-indexes-and-optimizer-dont-help-2%2F&sa=D&sntz=1&usg=AFQjCNFgTfu5bnjPw6boHWttJpzQBtaNgw>
> They are quick reads. 
>
> thank you, seen them, 
they are about SQL tuning mostly:
I've just used neo4j strucutre to store a graph with same label on 4M 
topics (I MUST keep it with one label), index on topic(name) property and 
used cypher to query the db,
this is my data structure. 

I've put a number of principles and principles in there, that you might 
> apply. 
> ie. Could you create the NEO4J equivalent of a temp table?
>
>
> Hope this helps.
>
>
> On Thursday, October 9, 2014 2:41:47 AM UTC-5, gg4u wrote:
>>
>> Hi Micheal, thank you.
>> sure I post my profile result here below !
>>
>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Traversing Large (weighted) graphs: performance, data structure, indexes

Reply via email to