Re: [Neo4j] Traversing Large (weighted) graphs: performance, data structure, indexes

gg4u Wed, 15 Oct 2014 15:13:10 -0700

Profile for the last query:
profile MATCH p = (n:Topic)-[*..2]-(m:Topic) where n.name = 'Topic66' and 
m.name = 'Topic111' with p, n, m return p, reduce(totProximity = 0, n IN 
relationships(p)| totProximity + n.proximity) AS pathProximity order by 
pathProximity;


==> 2411 rows
==> 
==> ColumnFilter(0)
==>   |
==>   +Sort
==>     |
==>     +Extract
==>       |
==>       +ColumnFilter(1)
==>         |
==>         +ExtractPath
==>           |
==>           +Filter
==>             |
==>             +TraversalMatcher
==> 
==> 
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
==> |         Operator |    Rows |  DbHits | Identifiers |                 
                                            Other |
==> 
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
==> |  ColumnFilter(0) |    2411 |       0 |             |                 
                    keep columns p, pathProximity |
==> |             Sort |    2411 |       0 |             |                 
                Cached(pathProximity of type Any) |
==> |          Extract |    2411 |    *9640* |             |               
                                      pathProximity |
==> |  ColumnFilter(1) |    2411 |       0 |             |                 
                             keep columns p, n, m |
==> |      ExtractPath |    2411 |       0 |           p |                 
                                                  |
==> |           Filter |    2411 | 4910094 |             | 
(hasLabel(m:Topic(0)) AND Property(m,name(1)) == {  AUTOSTRING1}) |
==> | TraversalMatcher | 1636698 | 1681810 |             |                 
                                m,   UNNAMED19, m |
==> 
+------------------+---------+---------+-------------+-------------------------------------------------------------------+

Il giorno giovedì 16 ottobre 2014 00:01:33 UTC+2, gg4u ha scritto:
>
> Sure, I tried three examples with (n), (n:Topic) and allShortestPath() and 
> also profiling them:
>
> 1.
>
> *MATCH  p = (n:Topic)-[*0..2]-(m:Topic)   where n.name <http://n.name> = 
> 'Topic1' and m.name <http://m.name> = 'Topic2'    return p, 
> reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity) 
> AS pathProximity    order by pathProximity DESC  LIMIT 6;*
>
> ==> | 
> [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[5662626]{proximity:47},Node[736816]{id:157427,name:"Topic3"},:P_Topic_Link[5662565]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}]
>  
>                  | 185
> ==> | 
> [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[5662626]{proximity:47},Node[736816]{id:157427,name:"Topic3"},:P_Topic_Link[1025864]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}]
>  
>                  | 185           |
>
> ...
>
>
> *==> 6 rows*
> *==> 162423 ms*
>
>
> *profile* MATCH  p = (n:Topic)-[*0..2]-(m:Topic)   where n.name = 
> 'Topic1' and m.name = 'Topic2'    return p, reduce(totProximity = 0, n IN 
> relationships(p)| totProximity + n.proximity) AS pathProximity    order by 
> pathProximity DESC  LIMIT 6;
>
> ==> 6 rows
> ==> 
> ==> ColumnFilter
> ==>   |
> ==>   +Top
> ==>     |
> ==>     +Extract
> ==>       |
> ==>       +ExtractPath
> ==>         |
> ==>         +Filter
> ==>           |
> ==>           +TraversalMatcher
> ==> 
> ==> 
> +------------------+---------+---------+-------------+-------------------------------------------------------------------+
> ==> |         Operator |    Rows |  DbHits | Identifiers |                 
>                                             Other |
> ==> 
> +------------------+---------+---------+-------------+-------------------------------------------------------------------+
> ==> |     ColumnFilter |       6 |       0 |             |                 
>                     keep columns p, pathProximity |
> ==> |              Top |       6 |       0 |             |                 
>   {  AUTOINT3};* Cached(pathProximity of type Any) *|
> ==> |          Extract |       9 |      36 |             |                 
>                                     pathProximity |
> ==> |      ExtractPath |       9 |       0 |           p |                 
>                                                   |
> ==> |           Filter |       9 | 3032385 |             | 
> (hasLabel(m:Topic(0)) AND Property(m,name(1)) == {  AUTOSTRING1}) |
> ==> | TraversalMatcher | 1010795 | 1024307 |             |                 
>                                 m,   UNNAMED20, m |
> ==> 
> +------------------+---------+---------+-------------+-------------------------------------------------------------------+
> ==> 
>
>
> MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where n.name = 
> 'Topic1' and m.name = 'Topic2' with p, n, m return p, reduce(totProximity 
> = 0, n IN relationships(p)| totProximity + n.proximity) AS pathProximity 
> order by pathProximity;
>
> ==> 9 rows
> *==> 10111 ms*
>
>
> ==> 9 rows
> ==> 
> ==> ColumnFilter
> ==>   |
> ==>   +Sort
> ==>     |
> ==>     +Extract
> ==>       |
> ==>       +ShortestPath
> ==>         |
> ==>         +SchemaIndex(0)
> ==>           |
> ==>           +SchemaIndex(1)
> ==> 
> ==> 
> +----------------+------+--------+-------------+-----------------------------------+
> ==> |       Operator | Rows | DbHits | Identifiers |                       
>       Other |
> ==> 
> +----------------+------+--------+-------------+-----------------------------------+
> ==> |   ColumnFilter |    9 |      0 |             |     keep columns p, 
> pathProximity |
> ==> |           Sort |    9 |      0 |             |* 
> Cached(pathProximity of type Any)* |
> ==> |        Extract |    9 |     36 |             |                     
> pathProximity |
> ==> |   ShortestPath |    9 |      0 |           p |                       
>             |
> ==> | SchemaIndex(0) |    1 |      2 |        m, m |     {  AUTOSTRING1}; 
> :Topic(name) |
> ==> | SchemaIndex(1) |    1 |      2 |        n, n |     {  AUTOSTRING0}; 
> :Topic(name) |
> ==> 
> +----------------+------+--------+-------------+-----------------------------------+
>
>
> 2. 
>
> MATCH  p = (n:Topic)-[*0..2]-(m:Topic)   where n.name = 'Topic44' and 
> m.name = 'Topic2'    return p, reduce(totProximity = 0, n IN 
> relationships(p)| totProximity + n.proximity) AS pathProximity    order by 
> pathProximity DESC  LIMIT 6;
>
> ==> 6 rows
> *==> 906108 ms*
>
>
>
> ==> 6 rows
> ==> 
> ==> ColumnFilter
> ==>   |
> ==>   +Top
> ==>     |
> ==>     +Extract
> ==>       |
> ==>       +ExtractPath
> ==>         |
> ==>         +Filter
> ==>           |
> ==>           +TraversalMatcher
> ==> 
> ==> 
> +------------------+---------+---------+-------------+-------------------------------------------------------------------+
> ==> |         Operator |    Rows |  DbHits | Identifiers |                 
>                                             Other |
> ==> 
> +------------------+---------+---------+-------------+-------------------------------------------------------------------+
> ==> |     ColumnFilter |       6 |       0 |             |                 
>                     keep columns p, pathProximity |
> ==> |              Top |       6 |       0 |             |                 
>   {  AUTOINT3}; Cached(pathProximity of type Any) |
> ==> |          Extract |      67 |     268 |             |                 
>                                     pathProximity |
> ==> |      ExtractPath |      67 |       0 |           p |                 
>                                                   |
> ==> |           Filter |      67 | 3246003 |             | 
> (hasLabel(m:Topic(0)) AND Property(m,name(1)) == {  AUTOSTRING1}) |
> ==> | TraversalMatcher | 1082001 | 1097166 |             |                 
>                                 m,   UNNAMED20, m |
> ==> 
> +------------------+---------+---------+-------------+-------------------------------------------------------------------+
>
>
>
> MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where n.name = 
> 'Topic44' and m.name = 'Topic2' with p, n, m return p, 
> reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity) 
> AS pathProximity order by pathProximity;
>
>
> magically and for first time:
> *146ms*
>
>
> so:
>
> profile MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where 
> n.name = 'Topic44' and m.name = 'Topic2' with p, n, m return p, 
> reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity) 
> AS pathProximity order by pathProximity;
>
>
> ==> 67 rows
> ==> 
> ==> ColumnFilter
> ==>   |
> ==>   +Sort
> ==>     |
> ==>     +Extract
> ==>       |
> ==>       +ShortestPath
> ==>         |
> ==>         +SchemaIndex(0)
> ==>           |
> ==>           +SchemaIndex(1)
> ==> 
> ==> 
> +----------------+------+--------+-------------+-----------------------------------+
> ==> |       Operator | Rows | DbHits | Identifiers |                       
>       Other |
> ==> 
> +----------------+------+--------+-------------+-----------------------------------+
> ==> |   ColumnFilter |   67 |      0 |             |     keep columns p, 
> pathProximity |
> ==> |           Sort |   67 |      0 |             | Cached(pathProximity 
> of type Any) |
> ==> |        Extract |   67 |    268 |             |                     
> pathProximity |
> ==> |   ShortestPath |   67 |      0 |           p |                       
>             |
> ==> | SchemaIndex(0) |    1 |      2 |        m, m |     {  AUTOSTRING1}; 
> :Topic(name) |
> ==> | SchemaIndex(1) |    1 |      2 |        n, n |     {  AUTOSTRING0}; 
> :Topic(name) |
> ==> 
> +----------------+------+--------+-------------+-----------------------------------+
> ==> 
>
>
>
>
> 3. 
> So I tried:
>
> MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where n.name = 
> 'Topic66' and m.name = 'Topic111' with p, n, m return p, 
> reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity) 
> AS pathProximity order by pathProximity;
>
> 2 rows
> 34337 ms
>
> and 
>
> MATCH p = (n:Topic)-[*..2]-(m:Topic) where n.name = 'Topic66' and m.name 
> = 'Topic111' with p, n, m return p, reduce(totProximity = 0, n IN 
> relationships(p)| totProximity + n.proximity) AS pathProximity order by 
> pathProximity;
>
> *2411 rows*
> *3228423 ms !!*
>
> Please also note that for each row there is a duplicate
> (in my structure I do have (a:Topic)-[]->(b:Topic) and 
> (b:Topic)-[]->(a:Topic), but I thought that (a:Topic)-[]-(b:Topic) would 
> give unique results since paths are the same ... huh ?
> ...
> ==> | 
> [Node[1103460]{id:18831,name:"Topic66"},:P_Topic_Link[68136903]{proximity:189},Node[1198508]{id:19594028,name:"Topic113"},:P_Topic_Link[68136874]{proximity:368},Node[1603710]{id:22939,name:"Topic111"}]
>  
>                                                                           
>  | 557           |
> ==> | 
> [Node[1103460]{id:18831,name:"Topic66"},:P_Topic_Link[68136903]{proximity:189},Node[1198508]{id:19594028,name:"Topic113"},:P_Topic_Link[1113182]{proximity:368},Node[1603710]{id:22939,name:"Topic111"}]
>  
>                                                                             
> | 557           |
>
>
>
>
> So I have that **allShortestPath()** gives faster time and **almost** 
> wanted results **only** if previously searches were made (cached). May it 
> be true?
> It d make sense partially: I expect graph algorithms faster than 
> retrieving paths, but a time for retriving 67 rows of general paths cannot 
> be that slow... (> 100 order of magnitude slower than allShortestPath() ?? )
>
> Would it make sense if post a script in python to generate a random 
> structure similar to the one I have, post again the configurations files 
> used for my server and batch-importer, post the header I used for loading 
> the csv with the batch importer, and you could tell me if responsive time 
> is less 1s (production time) ?
>  you could try same tests and post results and a step by step guide ? 
>
>
>
>
>
> Il giorno mercoledì 15 ottobre 2014 21:56:01 UTC+2, Michael Hunger ha 
> scritto:
>
> Can you just try this please?
>
> MATCH  p = (n:Topic)-[*0..2]-(m:Topic) 
>  where n.name = 'Topic1' and m.name = 'Topic2'  
>  return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + 
> n.proximity) AS pathProximity  
>  order by pathProximity DESC  LIMIT 6;
>
>
>
> On Wed, Oct 15, 2014 at 2:52 PM, gg4u <[email protected]> wrote:
>
> Hi Michael,
>
> sorry I don't understand what it means.
> Can I help you in helping me sorting out the issue somehow? :)
>
> What could I check or correct ?
> What is a pattern matcher and can you teach in reading the profile for 
> making your conclusion?
> Which may be possible reasons for selecting wrong pattern matcher, how to 
> correct it?
>
> thank you
>
> Il giorno mercoledì 15 ottobre 2014 14:04:57 UTC+2, Michael Hunger ha 
> scritto:
>
> Hi,
>
> from the profiling it seems that Cypher selects the wrong pattern matcher 
> if we separate the node-lookup and path-match.
>
> profile
>  MATCH  p = (n:Topic)-[*0..2]-(m:Topic) 
>  where n.name = 'Topic1' and m.name = 'Topic2'  
>  return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + 
> n.proximity) AS pathProximity  
>  order by pathProximity DESC  LIMIT 6;
>
>
> +------------------+------+--------+-------------+----------
> ---------------------------------------------------------+
> |         Operator | Rows | DbHits | Identifiers |                         
>                                     Other |
> +------------------+------+--------+-------------+----------
> ---------------------------------------------------------+
> |     ColumnFilter |    0 |      0 |             |                         
>             keep columns p, pathProximity |
> |              Top |    0 |      0 |             |                   { 
>  AUTOINT3}; Cached(pathProximity of type Any) |
> |          Extract |    0 |      0 |             |                         
>                             pathProximity |
> |      ExtractPath |    0 |      0 |           p |                         
>                                           |
> |           Filter |    0 |      0 |             | (hasLabel(m:Topic(0)) 
> AND Property(m,name(1)) == {  AUTOSTRING1}) |
> | TraversalMatcher |    0 |      1 |             |                         
>                         m,   UNNAMED20, m |
> +------------------+------+--------+-------------+----------
> ---------------------------------------------------------+
>
> On Wed, Oct 15, 2014 at 11:00 AM, gg4u <[email protected]> wrote:
>
> Hi Micheal, 
>
> your aggregation was only on the same paths, so you get 9 different paths 
> but you didn't show the counts per path. 
>
>
> not clear to me yet; I am gonna post results for each query you suggested 
> to try out.
>
> Rodger, to summarize a description of this test:
> 4M nodes labeled 'Topic'
> 100M rels (weighted)
> Index on Topic(name) > 'is a string type property for each node'
> 'Topic' dominates all dataset and this will be a subgraph of a larger 
> network (if we I can set this in production time, a next step will have a 
> graph of 85M nodes, ~2B rels, with same type of structure putting 
> properties as nodes' properties and not decoupling to other nodes). So this 
> is a primary, real case test, to see if it is feasible using Neo4j 
> datastructure Vs NoSQL.
> And I'd love the answer be yes :D
>
> Micheal, here another test with other topics (I think not cached):
>
> MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name = '
> *Topic100*' and m.name = '*Topic2*' with p, n, m return p, count(*) order 
> by count(*);
>
> results:
> ==> +-----------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ---------------+
> ==> | p                                                                   
>                                                                             
>                                                                             
>                       | count(*) |
> ==> +-----------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ---------------+
> ==> | [Node[4114904]{id:7955,name:"Topic100"},:P_Topic_Link[
> 10618620]{proximity:90},Node[3528892]{id:411782,name:"
> Topic101"},:P_Topic_Link[1025954]{proximity:68},Node[
> 1386672]{id:21245,name:"Topic2"}]                                         
>       | 1        |
> ==> | [Node[4114904]{id:7955,name:"Topic100"},:P_Topic_Link[
> 2424845]{proximity:91},Node[3719110]{id:52502,name:"
> Topic102"},:P_Topic_Link[1025923]{proximity:85},Node[
> 1386672]{id:21245,name:"Topic2"}]                    | 1        |
> ==> | [Node[4114904]{id:7955,name:"Topic100"},:P_Topic_Link[
> 100682940]{proximity:19},Node[3461206]{id:39782569,name:"
> Topic103"},:P_Topic_Link[100682931]{proximity:107},
> Node[1386672]{id:21245,name:"Topic2"}]            | 1        |
> ==> | [Node[4114904]{id:7955,name:"Topic100"},:P_Topic_Link[
> 21653222]{proximity:82},Node[706102]{id:1551073,name:"
> Topic104"},:P_Topic_Link[21653218]{proximity:87},Node[
> 1386672]{id:21245,name:"Topic2"}]                                 | 1     
>    |
>
> (.... results ...)
>  
> ==> +-----------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ---------------+
> ==> *67 rows*
> ==>* 3900775 ms*
>
>
>
> Il giorno martedì 14 ottobre 2014 22:54:43 UTC+2, Michael Hunger ha 
> scritto:
>
> How many rows does this return?
>
> MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name = 'Topic1' 
> and m.name = 'Topic2' with p, n, m return p, count(*) order by count(*);
>
> your aggregation was only on the same paths, so you get 9 different paths 
> but you didn't show the counts per path. 
>
>  
>
>
> and obtain 9 rows in 182799 ms
>
> On Tue, Oct 14, 2014 at 10:59 AM, gg4u <[email protected]> wrote:
>
> Yes:
>
> neo4j-sh (?)$ profile  MATCH (n:Topic), (m:Topic) where n.name = 'Topic1' 
> and m.name = 'Topic2'  MATCH  p = (n)-[*0..2]-(m) return p, 
> reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity) 
> AS pathProximity  order by pathProximity DESC  LIMIT 6;
> ==> 
> [...results...]
> ==> 6 rows
> ==> 
> ==> ColumnFilter
> ==>   |
> ==>   +Top
> ==>     |
> ==>     +Extract
> ==>       |
> ==>       +ExtractPath
> ==>         |
> ==>         +PatternMatcher
> ==>           |
> ==>           +SchemaIndex(0)
> ==>             |
> ==>             +SchemaIndex(1)
> ==> 
> ==> +----------------+------+--------+-------------------+------
> -------------------------------------------+
> ==> |       Operator | Rows | DbHits |       Identifiers |                 
>                           Other |
> ==> +----------------+------+--------+-------------------+------
> -------------------------------------------+
> ==> |   ColumnFilter |    6 |      0 |                   |                 
>   keep columns p, pathProximity |
> ==> |            Top |    6 |      0 |                   | {  AUTOINT3}; 
> Cached(pathProximity of type Any) |
> ==> |        Extract |    9 |     36 |                   |                 
>                   pathProximity |
> ==> |    ExtractPath |    9 |      0 |                 p |                 
>                                 |
> ==> | PatternMatcher |    9 |      0 | n, m,   UNNAMED94 |                 
>                                 |
> ==> | SchemaIndex(0) |    1 |      2 |              m, m |                 
>   {  AUTOSTRING1}; :Topic(name) |
> ==> | SchemaIndex(1) |    1 |      2 |              n, n |                 
>   {  AUTOSTRING0}; :Topic(name) |
> ==> +----------------+------+--------+-------------------+------
> -------------------------------------------+
> ==> 
> neo4j-sh (?)$ 
>
>
>
> Il giorno martedì 14 ottobre 2014 10:00:29 UTC+2, Michael Hunger ha 
> scritto:
>
> Can you try this:
>
> profile 
> MATCH (n:Topic), (m:Topic)
>  where n.name = 'Topic1' and m.name = 'Topic2' 
> MATCH  p = (n)-[*0..2]-(m)
> return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + 
> n.proximity) AS pathProximity 
> order by pathProximity DESC 
> LIMIT 6
>
>
> </s
>
> ...

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Traversing Large (weighted) graphs: performance, data structure, indexes

Reply via email to