Re: [Neo4j] Traversing Large (weighted) graphs: performance, data structure, indexes

gg4u Wed, 15 Oct 2014 15:01:55 -0700

Sure, I tried three examples with (n), (n:Topic) and allShortestPath() and 
also profiling them:


1.

*MATCH  p = (n:Topic)-[*0..2]-(m:Topic)   where n.name = 'Topic1' and 
m.name = 'Topic2'    return p, reduce(totProximity = 0, n IN 
relationships(p)| totProximity + n.proximity) AS pathProximity    order by 
pathProximity DESC  LIMIT 6;*

==> | 
[Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[5662626]{proximity:47},Node[736816]{id:157427,name:"Topic3"},:P_Topic_Link[5662565]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}]
 
                 | 185
==> | 
[Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[5662626]{proximity:47},Node[736816]{id:157427,name:"Topic3"},:P_Topic_Link[1025864]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}]
 
                 | 185           |

...


*==> 6 rows*
*==> 162423 ms*


*profile* MATCH  p = (n:Topic)-[*0..2]-(m:Topic)   where n.name = 'Topic1' 
and m.name = 'Topic2'    return p, reduce(totProximity = 0, n IN 
relationships(p)| totProximity + n.proximity) AS pathProximity    order by 
pathProximity DESC  LIMIT 6;

==> 6 rows
==> 
==> ColumnFilter
==>   |
==>   +Top
==>     |
==>     +Extract
==>       |
==>       +ExtractPath
==>         |
==>         +Filter
==>           |
==>           +TraversalMatcher
==> 
==> 
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
==> |         Operator |    Rows |  DbHits | Identifiers |                 
                                            Other |
==> 
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
==> |     ColumnFilter |       6 |       0 |             |                 
                    keep columns p, pathProximity |
==> |              Top |       6 |       0 |             |                 
  {  AUTOINT3};* Cached(pathProximity of type Any) *|
==> |          Extract |       9 |      36 |             |                 
                                    pathProximity |
==> |      ExtractPath |       9 |       0 |           p |                 
                                                  |
==> |           Filter |       9 | 3032385 |             | 
(hasLabel(m:Topic(0)) AND Property(m,name(1)) == {  AUTOSTRING1}) |
==> | TraversalMatcher | 1010795 | 1024307 |             |                 
                                m,   UNNAMED20, m |
==> 
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
==> 


MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where n.name = 
'Topic1' and m.name = 'Topic2' with p, n, m return p, reduce(totProximity = 
0, n IN relationships(p)| totProximity + n.proximity) AS pathProximity 
order by pathProximity;

==> 9 rows
*==> 10111 ms*


==> 9 rows
==> 
==> ColumnFilter
==>   |
==>   +Sort
==>     |
==>     +Extract
==>       |
==>       +ShortestPath
==>         |
==>         +SchemaIndex(0)
==>           |
==>           +SchemaIndex(1)
==> 
==> 
+----------------+------+--------+-------------+-----------------------------------+
==> |       Operator | Rows | DbHits | Identifiers |                       
      Other |
==> 
+----------------+------+--------+-------------+-----------------------------------+
==> |   ColumnFilter |    9 |      0 |             |     keep columns p, 
pathProximity |
==> |           Sort |    9 |      0 |             |* Cached(pathProximity 
of type Any)* |
==> |        Extract |    9 |     36 |             |                     
pathProximity |
==> |   ShortestPath |    9 |      0 |           p |                       
            |
==> | SchemaIndex(0) |    1 |      2 |        m, m |     {  AUTOSTRING1}; 
:Topic(name) |
==> | SchemaIndex(1) |    1 |      2 |        n, n |     {  AUTOSTRING0}; 
:Topic(name) |
==> 
+----------------+------+--------+-------------+-----------------------------------+


2. 

MATCH  p = (n:Topic)-[*0..2]-(m:Topic)   where n.name = 'Topic44' and 
m.name = 'Topic2'    return p, reduce(totProximity = 0, n IN 
relationships(p)| totProximity + n.proximity) AS pathProximity    order by 
pathProximity DESC  LIMIT 6;

==> 6 rows
*==> 906108 ms*



==> 6 rows
==> 
==> ColumnFilter
==>   |
==>   +Top
==>     |
==>     +Extract
==>       |
==>       +ExtractPath
==>         |
==>         +Filter
==>           |
==>           +TraversalMatcher
==> 
==> 
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
==> |         Operator |    Rows |  DbHits | Identifiers |                 
                                            Other |
==> 
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
==> |     ColumnFilter |       6 |       0 |             |                 
                    keep columns p, pathProximity |
==> |              Top |       6 |       0 |             |                 
  {  AUTOINT3}; Cached(pathProximity of type Any) |
==> |          Extract |      67 |     268 |             |                 
                                    pathProximity |
==> |      ExtractPath |      67 |       0 |           p |                 
                                                  |
==> |           Filter |      67 | 3246003 |             | 
(hasLabel(m:Topic(0)) AND Property(m,name(1)) == {  AUTOSTRING1}) |
==> | TraversalMatcher | 1082001 | 1097166 |             |                 
                                m,   UNNAMED20, m |
==> 
+------------------+---------+---------+-------------+-------------------------------------------------------------------+



MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where n.name = 
'Topic44' and m.name = 'Topic2' with p, n, m return p, reduce(totProximity 
= 0, n IN relationships(p)| totProximity + n.proximity) AS pathProximity 
order by pathProximity;


magically and for first time:
*146ms*


so:

profile MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where 
n.name = 'Topic44' and m.name = 'Topic2' with p, n, m return p, 
reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity) 
AS pathProximity order by pathProximity;


==> 67 rows
==> 
==> ColumnFilter
==>   |
==>   +Sort
==>     |
==>     +Extract
==>       |
==>       +ShortestPath
==>         |
==>         +SchemaIndex(0)
==>           |
==>           +SchemaIndex(1)
==> 
==> 
+----------------+------+--------+-------------+-----------------------------------+
==> |       Operator | Rows | DbHits | Identifiers |                       
      Other |
==> 
+----------------+------+--------+-------------+-----------------------------------+
==> |   ColumnFilter |   67 |      0 |             |     keep columns p, 
pathProximity |
==> |           Sort |   67 |      0 |             | Cached(pathProximity 
of type Any) |
==> |        Extract |   67 |    268 |             |                     
pathProximity |
==> |   ShortestPath |   67 |      0 |           p |                       
            |
==> | SchemaIndex(0) |    1 |      2 |        m, m |     {  AUTOSTRING1}; 
:Topic(name) |
==> | SchemaIndex(1) |    1 |      2 |        n, n |     {  AUTOSTRING0}; 
:Topic(name) |
==> 
+----------------+------+--------+-------------+-----------------------------------+
==> 




3. 
So I tried:

MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where n.name = 
'Topic66' and m.name = 'Topic111' with p, n, m return p, 
reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity) 
AS pathProximity order by pathProximity;

2 rows
34337 ms

and 

MATCH p = (n:Topic)-[*..2]-(m:Topic) where n.name = 'Topic66' and m.name = 
'Topic111' with p, n, m return p, reduce(totProximity = 0, n IN 
relationships(p)| totProximity + n.proximity) AS pathProximity order by 
pathProximity;

*2411 rows*
*3228423 ms !!*

Please also note that for each row there is a duplicate
(in my structure I do have (a:Topic)-[]->(b:Topic) and 
(b:Topic)-[]->(a:Topic), but I thought that (a:Topic)-[]-(b:Topic) would 
give unique results since paths are the same ... huh ?
...
==> | 
[Node[1103460]{id:18831,name:"Topic66"},:P_Topic_Link[68136903]{proximity:189},Node[1198508]{id:19594028,name:"Topic113"},:P_Topic_Link[68136874]{proximity:368},Node[1603710]{id:22939,name:"Topic111"}]
 
                                                                          
 | 557           |
==> | 
[Node[1103460]{id:18831,name:"Topic66"},:P_Topic_Link[68136903]{proximity:189},Node[1198508]{id:19594028,name:"Topic113"},:P_Topic_Link[1113182]{proximity:368},Node[1603710]{id:22939,name:"Topic111"}]
 
                                                                            
| 557           |




So I have that **allShortestPath()** gives faster time and **almost** 
wanted results **only** if previously searches were made (cached). May it 
be true?
It d make sense partially: I expect graph algorithms faster than retrieving 
paths, but a time for retriving 67 rows of general paths cannot be that 
slow... (> 100 order of magnitude slower than allShortestPath() ?? )

Would it make sense if post a script in python to generate a random 
structure similar to the one I have, post again the configurations files 
used for my server and batch-importer, post the header I used for loading 
the csv with the batch importer, and you could tell me if responsive time 
is less 1s (production time) ?
 you could try same tests and post results and a step by step guide ? 





Il giorno mercoledì 15 ottobre 2014 21:56:01 UTC+2, Michael Hunger ha 
scritto:
>
> Can you just try this please?
>
> MATCH  p = (n:Topic)-[*0..2]-(m:Topic) 
>  where n.name = 'Topic1' and m.name = 'Topic2'  
>  return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + 
> n.proximity) AS pathProximity  
>  order by pathProximity DESC  LIMIT 6;
>
>
>
> On Wed, Oct 15, 2014 at 2:52 PM, gg4u <[email protected] <javascript:>> 
> wrote:
>
>> Hi Michael,
>>
>> sorry I don't understand what it means.
>> Can I help you in helping me sorting out the issue somehow? :)
>>
>> What could I check or correct ?
>> What is a pattern matcher and can you teach in reading the profile for 
>> making your conclusion?
>> Which may be possible reasons for selecting wrong pattern matcher, how to 
>> correct it?
>>
>> thank you
>>
>> Il giorno mercoledì 15 ottobre 2014 14:04:57 UTC+2, Michael Hunger ha 
>> scritto:
>>>
>>> Hi,
>>>
>>> from the profiling it seems that Cypher selects the wrong pattern 
>>> matcher if we separate the node-lookup and path-match.
>>>
>>> profile
>>>  MATCH  p = (n:Topic)-[*0..2]-(m:Topic) 
>>>  where n.name = 'Topic1' and m.name = 'Topic2'  
>>>  return p, reduce(totProximity = 0, n IN relationships(p)| totProximity 
>>> + n.proximity) AS pathProximity  
>>>  order by pathProximity DESC  LIMIT 6;
>>>
>>>
>>> +------------------+------+--------+-------------+----------
>>> ---------------------------------------------------------+
>>> |         Operator | Rows | DbHits | Identifiers |                       
>>>                                       Other |
>>> +------------------+------+--------+-------------+----------
>>> ---------------------------------------------------------+
>>> |     ColumnFilter |    0 |      0 |             |                       
>>>               keep columns p, pathProximity |
>>> |              Top |    0 |      0 |             |                   { 
>>>  AUTOINT3}; Cached(pathProximity of type Any) |
>>> |          Extract |    0 |      0 |             |                       
>>>                               pathProximity |
>>> |      ExtractPath |    0 |      0 |           p |                       
>>>                                             |
>>> |           Filter |    0 |      0 |             | (hasLabel(m:Topic(0)) 
>>> AND Property(m,name(1)) == {  AUTOSTRING1}) |
>>> | TraversalMatcher |    0 |      1 |             |                       
>>>                           m,   UNNAMED20, m |
>>> +------------------+------+--------+-------------+----------
>>> ---------------------------------------------------------+
>>>
>>> On Wed, Oct 15, 2014 at 11:00 AM, gg4u <[email protected]> wrote:
>>>
>>>> Hi Micheal, 
>>>>
>>>> your aggregation was only on the same paths, so you get 9 different 
>>>>> paths but you didn't show the counts per path. 
>>>>>
>>>>
>>>> not clear to me yet; I am gonna post results for each query you 
>>>> suggested to try out.
>>>>
>>>> Rodger, to summarize a description of this test:
>>>> 4M nodes labeled 'Topic'
>>>> 100M rels (weighted)
>>>> Index on Topic(name) > 'is a string type property for each node'
>>>> 'Topic' dominates all dataset and this will be a subgraph of a larger 
>>>> network (if we I can set this in production time, a next step will have a 
>>>> graph of 85M nodes, ~2B rels, with same type of structure putting 
>>>> properties as nodes' properties and not decoupling to other nodes). So 
>>>> this 
>>>> is a primary, real case test, to see if it is feasible using Neo4j 
>>>> datastructure Vs NoSQL.
>>>> And I'd love the answer be yes :D
>>>>
>>>> Micheal, here another test with other topics (I think not cached):
>>>>
>>>> MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name = '
>>>> *Topic100*' and m.name = '*Topic2*' with p, n, m return p, count(*) 
>>>> order by count(*);
>>>>
>>>> results:
>>>> ==> +-----------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ---------------+
>>>> ==> | p                                                                 
>>>>                                                                            
>>>>  
>>>>                                                                            
>>>>  
>>>>                         | count(*) |
>>>> ==> +-----------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ---------------+
>>>> ==> | [Node[4114904]{id:7955,name:"Topic100"},:P_Topic_Link[
>>>> 10618620]{proximity:90},Node[3528892]{id:411782,name:"
>>>> Topic101"},:P_Topic_Link[1025954]{proximity:68},Node[
>>>> 1386672]{id:21245,name:"Topic2"}]                                     
>>>>           | 1        |
>>>> ==> | [Node[4114904]{id:7955,name:"Topic100"},:P_Topic_Link[
>>>> 2424845]{proximity:91},Node[3719110]{id:52502,name:"
>>>> Topic102"},:P_Topic_Link[1025923]{proximity:85},Node[
>>>> 1386672]{id:21245,name:"Topic2"}]                    | 1        |
>>>> ==> | [Node[4114904]{id:7955,name:"Topic100"},:P_Topic_Link[
>>>> 100682940]{proximity:19},Node[3461206]{id:39782569,name:"
>>>> Topic103"},:P_Topic_Link[100682931]{proximity:107},
>>>> Node[1386672]{id:21245,name:"Topic2"}]            | 1        |
>>>> ==> | [Node[4114904]{id:7955,name:"Topic100"},:P_Topic_Link[
>>>> 21653222]{proximity:82},Node[706102]{id:1551073,name:"
>>>> Topic104"},:P_Topic_Link[21653218]{proximity:87},Node[
>>>> 1386672]{id:21245,name:"Topic2"}]                                 | 1 
>>>>        |
>>>>
>>>> (.... results ...)
>>>>  
>>>> ==> +-----------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ---------------+
>>>> ==> *67 rows*
>>>> ==>* 3900775 ms*
>>>>
>>>>
>>>>
>>>> Il giorno martedì 14 ottobre 2014 22:54:43 UTC+2, Michael Hunger ha 
>>>> scritto:
>>>>>
>>>>> How many rows does this return?
>>>>>
>>>>> MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name = 
>>>>> 'Topic1' and m.name = 'Topic2' with p, n, m return p, count(*) order 
>>>>> by count(*);
>>>>>
>>>>> your aggregation was only on the same paths, so you get 9 different 
>>>>> paths but you didn't show the counts per path. 
>>>>>
>>>>  
>>>>>
>>>>
>>>>> and obtain 9 rows in 182799 ms
>>>>>
>>>>> On Tue, Oct 14, 2014 at 10:59 AM, gg4u <[email protected]> wrote:
>>>>>
>>>>>> Yes:
>>>>>>
>>>>>> neo4j-sh (?)$ profile  MATCH (n:Topic), (m:Topic) where n.name = 
>>>>>> 'Topic1' and m.name = 'Topic2'  MATCH  p = (n)-[*0..2]-(m) return p, 
>>>>>> reduce(totProximity = 0, n IN relationships(p)| totProximity + 
>>>>>> n.proximity) 
>>>>>> AS pathProximity  order by pathProximity DESC  LIMIT 6;
>>>>>> ==> 
>>>>>> [...results...]
>>>>>> ==> 6 rows
>>>>>> ==> 
>>>>>> ==> ColumnFilter
>>>>>> ==>   |
>>>>>> ==>   +Top
>>>>>> ==>     |
>>>>>> ==>     +Extract
>>>>>> ==>       |
>>>>>> ==>       +ExtractPath
>>>>>> ==>         |
>>>>>> ==>         +PatternMatcher
>>>>>> ==>           |
>>>>>> ==>           +SchemaIndex(0)
>>>>>> ==>             |
>>>>>> ==>             +SchemaIndex(1)
>>>>>> ==> 
>>>>>> ==> +----------------+------+--------+-------------------+------
>>>>>> -------------------------------------------+
>>>>>> ==> |       Operator | Rows | DbHits |       Identifiers |           
>>>>>>                                 Other |
>>>>>> ==> +----------------+------+--------+-------------------+------
>>>>>> -------------------------------------------+
>>>>>> ==> |   ColumnFilter |    6 |      0 |                   |           
>>>>>>         keep columns p, pathProximity |
>>>>>> ==> |            Top |    6 |      0 |                   | { 
>>>>>>  AUTOINT3}; Cached(pathProximity of type Any) |
>>>>>> ==> |        Extract |    9 |     36 |                   |           
>>>>>>                         pathProximity |
>>>>>> ==> |    ExtractPath |    9 |      0 |                 p |           
>>>>>>                                       |
>>>>>> ==> | PatternMatcher |    9 |      0 | n, m,   UNNAMED94 |           
>>>>>>                                       |
>>>>>> ==> | SchemaIndex(0) |    1 |      2 |              m, m |           
>>>>>>         {  AUTOSTRING1}; :Topic(name) |
>>>>>> ==> | SchemaIndex(1) |    1 |      2 |              n, n |           
>>>>>>         {  AUTOSTRING0}; :Topic(name) |
>>>>>> ==> +----------------+------+--------+-------------------+------
>>>>>> -------------------------------------------+
>>>>>> ==> 
>>>>>> neo4j-sh (?)$ 
>>>>>>
>>>>>>
>>>>>>
>>>>>> Il giorno martedì 14 ottobre 2014 10:00:29 UTC+2, Michael Hunger ha 
>>>>>> scritto:
>>>>>>>
>>>>>>> Can you try this:
>>>>>>>
>>>>>>> profile 
>>>>>>> MATCH (n:Topic), (m:Topic)
>>>>>>>  where n.name = 'Topic1' and m.name = 'Topic2' 
>>>>>>> MATCH  p = (n)-[*0..2]-(m)
>>>>>>> return p, reduce(totProximity = 0, n IN relationships(p)| 
>>>>>>> totProximity + n.proximity) AS pathProximity 
>>>>>>> order by pathProximity DESC 
>>>>>>> LIMIT 6
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 14, 2014 at 9:06 AM, gg4u <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Rodjer,
>>>>>>>>
>>>>>>>> thank you for your insights!
>>>>>>>> please see comments below:
>>>>>>>>
>>>>>>>> Il giorno lunedì 13 ottobre 2014 18:37:50 UTC+2, Rodger ha scritto:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I've done a lot of RDBMS performance tuning.
>>>>>>>>> Just a few quick thoughts.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Be sure to run the queries in the shell, if you are not already 
>>>>>>>>> doing so.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yes, they are run in the shell:
>>>>>>>> http://localhost:7474/webadmin/#/console/
>>>>>>>>  
>>>>>>>>
>>>>>>>>> How many rows are returned? Just sorting, then returning many 
>>>>>>>>> rows, 
>>>>>>>>> takes a long time to scroll them to output. 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> 9 rows
>>>>>>>> In the answer above, I wrote 9 paths
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>>>
>>>>>>>>> If you are getting duplicates, it may be the equivalent of a 
>>>>>>>>> cartesian product, 
>>>>>>>>> one of the worst things that can happen in RDBMS, and also one
>>>>>>>>> of the least known. See my presentation on them here:
>>>>>>>>> http://rodgersnotes.wordpress.com/2010/09/15/stamping-out-ca
>>>>>>>>> rtesian-products/ 
>>>>>>>>> <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2010%2F09%2F15%2Fstamping-out-cartesian-products%2F&sa=D&sntz=1&usg=AFQjCNHJDOJ0IOsI6XRsg_9yuTscI4mqtQ>
>>>>>>>>>
>>>>>>>>
>>>>>>>> So I had a look at your pdf,
>>>>>>>> http://rodgersnotes.files.wordpress.com/2010/09/cartprodword
>>>>>>>> press.pdf
>>>>>>>> page 11
>>>>>>>>
>>>>>>>> and I think the idea you want to suggest, is to avoid duplicates 
>>>>>>>> (you called them 'cartesian products') by enforcing conditions.
>>>>>>>> Though, since it is a graph db and not relational, not clear to me 
>>>>>>>> where this applies because in the graph db I don't have 'jointed' 
>>>>>>>> queries 
>>>>>>>> between tables,
>>>>>>>> so the conditions I have are, at least in my case, properties 
>>>>>>>> (index on properties), and no-directional rels.
>>>>>>>>  
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Try:
>>>>>>>>>
>>>>>>>>> return p, count (*) 
>>>>>>>>> order by count(*)
>>>>>>>>>
>>>>>>>>
>>>>>>>> I run:
>>>>>>>>
>>>>>>>> profile MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where 
>>>>>>>> n.name = 'Topic1' and m.name = 'Topic2' with p, n, m return p, 
>>>>>>>> count(*) order by count(*);
>>>>>>>>
>>>>>>>> and I've got: (see there are also duplicates in paths: is it 
>>>>>>>> because I have both (a)-[]->(b) and (a)<-[]-(b) ?)
>>>>>>>>
>>>>>>>> ==> +-----------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------+
>>>>>>>> ==> | p                                                             
>>>>>>>>                                                                        
>>>>>>>>      
>>>>>>>>                                                                        
>>>>>>>>      
>>>>>>>>                       | count(*) |
>>>>>>>> ==> +-----------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------+
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[711852
>>>>>>>> 98]{proximity:68},Node[1401899]{id:21375850,name:"Topic3"},:
>>>>>>>> P_Topic_Link[71185313]{proximity:32},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>  
>>>>>>>>                   | 1        |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[886757
>>>>>>>> 19]{proximity:28},Node[2594397]{id:31760062,name:"Topic4"},:
>>>>>>>> P_Topic_Link[88675745]{proximity:23},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>  
>>>>>>>>           | 1        |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[307360
>>>>>>>> 00]{proximity:32},Node[2515502]{id:3106745,name:"Topic5"},:P
>>>>>>>> _Topic_Link[30735974]{proximity:82},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>  
>>>>>>>> | 1        |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[682063
>>>>>>>> 83]{proximity:72},Node[1202629]{id:19635605,name:"Topic6"},:
>>>>>>>> P_Topic_Link[68206440]{proximity:32},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>  
>>>>>>>>              | 1        |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[988981
>>>>>>>> 73]{proximity:23},Node[3329750]{id:38567205,name:"Topic7"},:
>>>>>>>> P_Topic_Link[98898126]{proximity:124},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>  
>>>>>>>>                        | 1        |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[581077
>>>>>>>> 55]{proximity:55},Node[506613]{id:13841207,name:"Topic8"},:P
>>>>>>>> _Topic_Link[58107766]{proximity:27},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>  
>>>>>>>>                             | 1        |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[988981
>>>>>>>> 73]{proximity:23},Node[3329750]{id:38567205,name:"Topic7"},:
>>>>>>>> P_Topic_Link[1025873]{proximity:124},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>  
>>>>>>>>                         | 1        |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[566262
>>>>>>>> 6]{proximity:47},Node[736816]{id:157427,name:"Topic9"},:P_To
>>>>>>>> pic_Link[5662565]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>  
>>>>>>>>                  | 1        |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[566262
>>>>>>>> 6]{proximity:47},Node[736816]{id:157427,name:"Topic9"},:P_To
>>>>>>>> pic_Link[1025864]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>  
>>>>>>>>                  | 1        |
>>>>>>>> ==> +-----------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------+
>>>>>>>> ==> 9 rows
>>>>>>>> ==> 
>>>>>>>> ==> ColumnFilter(0)
>>>>>>>> ==>   |
>>>>>>>> ==>   +Sort
>>>>>>>> ==>     |
>>>>>>>> ==>     +EagerAggregation
>>>>>>>> ==>       |
>>>>>>>> ==>       +ColumnFilter(1)
>>>>>>>> ==>         |
>>>>>>>> ==>         +ExtractPath
>>>>>>>> ==>           |
>>>>>>>> ==>           +Filter
>>>>>>>> ==>             |
>>>>>>>> ==>             +TraversalMatcher
>>>>>>>> ==> 
>>>>>>>> ==> +------------------+---------+---------+-------------+------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ----------------+
>>>>>>>> ==> |         Operator |    Rows |  DbHits | Identifiers |         
>>>>>>>>                                                                    
>>>>>>>> Other |
>>>>>>>> ==> +------------------+---------+---------+-------------+------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ----------------+
>>>>>>>> ==> |  ColumnFilter(0) |       9 |       0 |             |         
>>>>>>>>                                                 keep columns p, 
>>>>>>>> count(*) |
>>>>>>>> ==> |             Sort |       9 |       0 |             | Cached( 
>>>>>>>>  INTERNAL_AGGREGATE931614f3-4def-4fc4-a80b-c6fca3839817 of type 
>>>>>>>> Integer) |
>>>>>>>> ==> | EagerAggregation |       9 |       0 |             |         
>>>>>>>>                                                                        
>>>>>>>> p |
>>>>>>>> ==> |  ColumnFilter(1) |       9 |       0 |             |         
>>>>>>>>                                                     keep columns p, n, 
>>>>>>>> m |
>>>>>>>> ==> |      ExtractPath |       9 |       0 |           p |         
>>>>>>>>                                                                        
>>>>>>>>   |
>>>>>>>> ==> |           Filter |       9 | 3032385 |             |         
>>>>>>>>        (hasLabel(m:Topic(0)) AND Property(m,name(1)) == {  
>>>>>>>> AUTOSTRING1}) |
>>>>>>>> ==> | TraversalMatcher | 1010795 | 1024307 |             |         
>>>>>>>>                                                        m,   UNNAMED36, 
>>>>>>>> m |
>>>>>>>> ==> +------------------+---------+---------+-------------+------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ----------------+
>>>>>>>> ==> 
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Without me looking at the raw data, and the query result, you
>>>>>>>>> seem to have many operations going on. So, you have a lot of rows 
>>>>>>>>> in 
>>>>>>>>> the profile output. 
>>>>>>>>>
>>>>>>>>
>>>>>>>> Only 9
>>>>>>>>  
>>>>>>>>
>>>>>>>>>  As a general rule, the more rows there are in the 
>>>>>>>>> profile, the slower the response time is. 
>>>>>>>>> ie. the more complex the query, the slower it is. 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If I were looking at this, I would try to isolate which part of 
>>>>>>>>> the query is the slow part.  The Return clause, or the Match 
>>>>>>>>> clause?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You've already tried the response times with the data.
>>>>>>>>> Try to simply: 
>>>>>>>>> return count(*) .
>>>>>>>>>
>>>>>>>>
>>>>>>>> I run:
>>>>>>>> MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name = 
>>>>>>>> 'Topic1' and m.name = 'Topic2' with p, n, m return p, count(*) 
>>>>>>>> order by count(*);
>>>>>>>>
>>>>>>>> and obtain 9 rows in 182799 ms
>>>>>>>>
>>>>>>>> I run:
>>>>>>>> MATCH (n:Topic), (m:Topic) where n.name = 'Topic1' and m.name = 
>>>>>>>> 'Topic2' with n, m return count(*);
>>>>>>>>
>>>>>>>> and obtain 856ms
>>>>>>>>
>>>>>>>>
>>>>>>>> profile MATCH (n:Topic), (m:Topic) where n.name = 'Topic1' and 
>>>>>>>> m.name = 'Topic2' with n, m return count(*);
>>>>>>>>
>>>>>>>> results in:
>>>>>>>>
>>>>>>>>
>>>>>>>> ==> ColumnFilter
>>>>>>>> ==>   |
>>>>>>>> ==>   +EagerAggregation
>>>>>>>> ==>     |
>>>>>>>> ==>     +SchemaIndex(0)
>>>>>>>> ==>       |
>>>>>>>> ==>       +SchemaIndex(1)
>>>>>>>> ==> 
>>>>>>>> ==> +------------------+------+--------+-------------+----------
>>>>>>>> ---------------------+
>>>>>>>> ==> |         Operator | Rows | DbHits | Identifiers |             
>>>>>>>>             Other |
>>>>>>>> ==> +------------------+------+--------+-------------+----------
>>>>>>>> ---------------------+
>>>>>>>> ==> |     ColumnFilter |    1 |      0 |             |         keep 
>>>>>>>> columns count(*) |
>>>>>>>> ==> | EagerAggregation |    1 |      0 |             |             
>>>>>>>>                   |
>>>>>>>> ==> |   SchemaIndex(0) |    1 |      2 |        m, m | { 
>>>>>>>>  AUTOSTRING1}; :Topic(name) |
>>>>>>>> ==> |   SchemaIndex(1) |    1 |      2 |        n, n | { 
>>>>>>>>  AUTOSTRING0}; :Topic(name) |
>>>>>>>> ==> +------------------+------+--------+-------------+----------
>>>>>>>> ---------------------+
>>>>>>>>  
>>>>>>>>
>>>>>>>>> How many seconds response time is that, versus the original query? 
>>>>>>>>> What is the resulting profile?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> So, it looks like it actually take huge time in traversing the 
>>>>>>>> graph,
>>>>>>>> while reasonable time '~900ms' to match a fullstring node.
>>>>>>>>
>>>>>>>> *Any idea for improving performance of traversal??*
>>>>>>>>
>>>>>>>> *It is a real problem, since also for getting results of first 
>>>>>>>> neighbors of a node, I met the same problem which makes currently 
>>>>>>>> unfeasible for production :*
>>>>>>>> *Anyone with real case of similar size graph and structure trying 
>>>>>>>> to perform a similar query?*
>>>>>>>>
>>>>>>>> as example, this query to obtain first neighbors of node Topic44:
>>>>>>>>
>>>>>>>> MATCH (n:Topic) , (m), p = (n)-[*0..1]-(m)
>>>>>>>> where n.name = 'Topic44' 
>>>>>>>> with p, n, m
>>>>>>>> return p, reduce(totProximity = 0, n IN relationships(p)| 
>>>>>>>> totProximity + n.proximity) AS pathProximity order by pathProximity 
>>>>>>>> DESC 
>>>>>>>> LIMIT 6
>>>>>>>>
>>>>>>>> returns
>>>>>>>> 6 rows in ~65000 ms VS 6 rows in less than a second with a NoSQL.
>>>>>>>>
>>>>>>>> Any idea?
>>>>>>>>
>>>>>>>> thank you guys for helping!! Hope to find a solution soon..
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> See also the tuning presentations I've done: 
>>>>>>>>> http://rodgersnotes.wordpress.com/2010/09/14/oracle-performa
>>>>>>>>> nce-tuning/ 
>>>>>>>>> <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2010%2F09%2F14%2Foracle-performance-tuning%2F&sa=D&sntz=1&usg=AFQjCNE0XK_XcNk5YBj806h6a1OJHr0glA>
>>>>>>>>> http://rodgersnotes.wordpress.com/2014/06/08/tuning-the-untu
>>>>>>>>> nable-when-indexes-and-optimizer-dont-help-2/ 
>>>>>>>>> <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2014%2F06%2F08%2Ftuning-the-untunable-when-indexes-and-optimizer-dont-help-2%2F&sa=D&sntz=1&usg=AFQjCNFgTfu5bnjPw6boHWttJpzQBtaNgw>
>>>>>>>>> They are quick reads. 
>>>>>>>>>
>>>>>>>>> thank you, seen them, 
>>>>>>>> they are about SQL tuning mostly:
>>>>>>>> I've just used neo4j strucutre to store a graph with same label on 
>>>>>>>> 4M topics (I MUST keep it with one label), index on topic(name) 
>>>>>>>> property 
>>>>>>>> and used cypher to query the db,
>>>>>>>> this is my data structure. 
>>>>>>>>
>>>>>>>> I've put a number of principles and principles in there, that you 
>>>>>>>>> might apply. 
>>>>>>>>> ie. Could you create the NEO4J equivalent of a temp table?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hope this helps.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thursday, October 9, 2014 2:41:47 AM UTC-5, gg4u wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Micheal, thank you.
>>>>>>>>>> sure I post my profile result here below !
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>  -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "Neo4j" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to [email protected].
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>>  -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Neo4j" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Traversing Large (weighted) graphs: performance, data structure, indexes

Reply via email to