Thank you Mark: You're right this thread became hard to follow, and issue is still on. I will re-import everything again since I haven't found a solution: maybe there's something I do wrong in importing and creating indexes?
I arranged also a python script to generate a random weighted graph with textual labels, as test. I d love to hear what other people can find out... :)) Here's my contribution: *https://groups.google.com/forum/#!topic/neo4j/UyqzNZwlKU4 <https://groups.google.com/forum/#!topic/neo4j/UyqzNZwlKU4>* Il giorno giovedì 16 ottobre 2014 11:23:03 UTC+2, Mark Findlater ha scritto: > > There is a lot of history here that I cannot follow, and Michael is > clearly thinking about something which means that the solution is not > simple, but your profile (which reads bottom up) does not start well and > isn't using your indexes. Unless I have missed something somewhere about > why you cannot do this your very last query should perform (much) better if > it begins with an Index hit rather than TraversalMatcher. > > MATCH (n:Topic{name:"Topic66"}), (m:Topic{name:"Topic111"}) > WITH n, m > MATCH (n)-[*..2]-(m) > WITH p, n, m > RETURN p, reduce(totProximity = 0, n IN relationships(p)| totProximity + > n.proximity) AS pathProximity order by pathProximity; > > Also, your assertion "would give unique results since paths are the same > ... huh ?" is incorrect, because the paths are not the same, the nodes in > the paths may be but the relationships/traversal routes are not. Is there > any reason for you to duplicate all of your relationships (given you can > navigate them in either direction anyway)? > > Apologies if I have gone way off piste, > > M > > On Wednesday, 15 October 2014 23:12:35 UTC+1, gg4u wrote: > > Profile for the last query: > profile MATCH p = (n:Topic)-[*..2]-(m:Topic) where n.name = 'Topic66' and > m.name = 'Topic111' with p, n, m return p, reduce(totProximity = 0, n IN > relationships(p)| totProximity + n.proximity) AS pathProximity order by > pathProximity; > > ==> 2411 rows > ==> > ==> ColumnFilter(0) > ==> | > ==> +Sort > ==> | > ==> +Extract > ==> | > ==> +ColumnFilter(1) > ==> | > ==> +ExtractPath > ==> | > ==> +Filter > ==> | > ==> +TraversalMatcher > ==> > ==> > +------------------+---------+---------+-------------+-------------------------------------------------------------------+ > ==> | Operator | Rows | DbHits | Identifiers | > Other | > ==> > +------------------+---------+---------+-------------+-------------------------------------------------------------------+ > ==> | ColumnFilter(0) | 2411 | 0 | | > keep columns p, pathProximity | > ==> | Sort | 2411 | 0 | | > Cached(pathProximity of type Any) | > ==> | Extract | 2411 | *9640* | | > pathProximity | > ==> | ColumnFilter(1) | 2411 | 0 | | > keep columns p, n, m | > ==> | ExtractPath | 2411 | 0 | p | > | > ==> | Filter | 2411 | 4910094 | | > (hasLabel(m:Topic(0)) AND Property(m,name(1)) == { AUTOSTRING1}) | > ==> | TraversalMatcher | 1636698 | 1681810 | | > m, UNNAMED19, m | > ==> > +------------------+---------+---------+-------------+-------------------------------------------------------------------+ > > Il giorno giovedì 16 ottobre 2014 00:01:33 UTC+2, gg4u ha scritto: > > Sure, I tried three examples with (n), (n:Topic) and allShortestPath() and > also profiling them: > > 1. > > *MATCH p = (n:Topic)-[*0..2]-(m:Topic) where n.name <http://n.name> = > 'Topic1' and m.name <http://m.name> = 'Topic2' return p, > reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity) > AS pathProximity order by pathProximity DESC LIMIT 6;* > > ==> | > [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[5662626]{proximity:47},Node[736816]{id:157427,name:"Topic3"},:P_Topic_Link[5662565]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}] > > | 185 > ==> | > [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[5662626]{proximity:47},Node[736816]{id:157427,name:"Topic3"},:P_Topic_Link[1025864]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}] > > | 185 | > > ... > > > *==> 6 rows* > *==> 162423 ms* > > > *profile* MATCH p = (n:Topic)-[*0..2]-(m:Topic) where n.name = > 'Topic1' and m.name = 'Topic2' return p, reduce(totProximity = 0, n IN > relationships(p)| totProximity + n.proximity) AS pathProximity order by > pathProximity DESC LIMIT 6; > > ==> 6 rows > ==> > ==> ColumnFilter > ==> | > ==> +Top > ==> | > ==> +Extract > ==> | > ==> +ExtractPath > ==> | > ==> +Filter > ==> | > ==> +TraversalMatcher > ==> > ==> > +------------------+---------+---------+-------------+-------------------------------------------------------------------+ > ==> | Operator | Rows | DbHits | Identifiers | > Other | > ==> > +------------------+---------+---------+-------------+-------------------------------------------------------------------+ > ==> | ColumnFilter | 6 | 0 | | > keep columns p, pathProximity | > ==> | Top | 6 | 0 | | > { AUTOINT3};* Cached(pathProximity of type Any) *| > ==> | Extract | 9 | 36 | | > pathProximity | > ==> | ExtractPath | 9 | 0 | p | > | > ==> | Filter | 9 | 3032385 | | > (hasLabel(m:Topic(0)) AND Property(m,name(1)) == { AUTOSTRING1}) | > ==> | TraversalMatcher | 1010795 | 1024307 | | > m, UNNAMED20, m | > ==> > +------------------+---------+---------+-------------+-------------------------------------------------------------------+ > ==> > > > MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where n.name = > 'Topic1' and m.name = 'Topic2' with p, n, m return p, reduce(totProximity > = 0, n IN relationships(p)| totProximity + n.proximity) AS pathProximity > order by pathProximity; > > ==> 9 rows > *==> 10111 ms* > > > ==> 9 rows > ==> > ==> ColumnFilter > ==> | > ==> +Sort > ==> | > ==> +Extract > ==> | > ==> +ShortestPath > ==> | > ==> +SchemaIndex(0) > ==> | > ==> +SchemaIndex(1) > ==> > ==> > +----------------+------+--------+-------------+-----------------------------------+ > ==> | Operator | Rows | DbHits | Identifiers | > Other | > ==> > +----------------+------+--------+-------------+-----------------------------------+ > ==> | ColumnFilter | 9 | 0 | | keep columns p, > pathProximity | > ==> | Sort | 9 | 0 | |* > Cached(pathProximity of type Any)* | > ==> | Extract | 9 | 36 | | > pathProximity | > ==> | ShortestPath | 9 | 0 | p | > | > ==> | SchemaIndex(0) | 1 | 2 | m, m | { AUTOSTRING1}; > :Topic(name) | > ==> | SchemaIndex(1) | 1 | 2 | n, n | { AUTOSTRING0}; > :Topic(name) | > ==> > +----------------+------+--------+-------------+-----------------------------------+ > > > 2. > > MATCH p = (n:Topic)-[*0..2]-(m:Topic) where n.name = 'Topic44' and > m.name = 'Topic2' return p, reduce(totProximity = 0, n IN > relationships(p)| totProximity + n.proximity) AS pathProximity order by > pathProximity DESC LIMIT 6; > > ==> 6 rows > *==> 906108 ms* > > > > ==> 6 rows > ==> > ==> ColumnFilter > ==> | > ==> +Top > ==> | > ==> +Extract > ==> | > ==> +ExtractPath > ==> | > ==> +Filter > ==> | > ==> +TraversalMatcher > ==> > ==> > +------------------+---------+---------+-------------+-------------------------------------------------------------------+ > ==> | Operator | Rows | DbHits | Identifiers | > Other | > ==> > +------------------+---------+---------+-------------+-------------------------------------------------------------------+ > ==> | ColumnFilter | 6 | 0 | | > keep columns p, pathProximity | > ==> | Top | 6 | 0 | | > { AUTOINT3}; Cached(pathProximity of type Any) | > ==> | Extract | 67 | 268 | | > pathProximity | > ==> | ExtractPath | 67 | 0 | p | > | > ==> | Filter | 67 | 3246003 | | > (hasLabel(m:Topic(0)) AND Property(m,name(1)) == { AUTOSTRING1}) | > ==> | TraversalMatcher | 1082001 | 1097166 | | > m, UNNAMED20, m | > ==> > +------------------+---------+---------+-------------+-------------------------------------------------------------------+ > > > > MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where n.name = > 'Topic44' and m.name = 'Topic2' with p, n, m return p, > reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity) > AS pathProximity order by pathProximity; > > > magically and for first time: > *146ms* > > > so: > > profile MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where > n.name = 'Topic44' and m.name = 'Topic2' with p, n, m return p, > reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity) > AS pathProximity order by pathProximity; > > > ==> 67 rows > ==> > ==> ColumnFilter > ==> | > ==> +Sort > ==> | > ==> +Extract > ==> | > ==> +ShortestPath > ==> | > ==> +SchemaIndex(0) > ==> | > ==> +SchemaIndex(1) > ==> > ==> > +----------------+------+--------+-------------+-----------------------------------+ > ==> | Operator | Rows | DbHits | Identifiers | > Other | > ==> > +----------------+------+--------+-------------+-----------------------------------+ > ==> | ColumnFilter | 67 | 0 | | keep columns p, > pathProximity | > ==> | Sort | 67 | 0 | | Cached(pathProximity > of type Any) | > ==> | Extract | 67 | 268 | | > pathProximity | > ==> | ShortestPath | 67 | 0 | p | > | > ==> | SchemaIndex(0) | 1 | 2 | m, m | { AUTOSTRING1}; > :Topic(name) | > ==> | SchemaIndex(1) | 1 | 2 | n, n | { AUTOSTRING0}; > :Topic(name) | > ==> > +----------------+------+--------+-------------+-----------------------------------+ > ==> > > > > > 3. > So I tried: > > MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where n.name = > 'Topic66' and m.name = 'Topic111' with p, n, m return p, > reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity) > AS pathProximity order by pathProximity; > > 2 rows > 34337 ms > > and > > MATCH p = (n:Topic)-[*..2]-(m:Topic) where n.name = 'Topic66' and m.name > = 'Topic111' with p, n, m return p, reduce(totProximity = 0, n IN > relationships(p)| totProximity + n.proximity) AS pathProximity order by > pathProximity; > > *2411 rows* > *3228423 ms !!* > > Please also note that for each row there is a duplicate > (in my structure I do have (a:Topic)-[]->(b:Topic) and > (b:Topic)-[]->(a:Topic), but I thought that (a:Topic)-[]-(b:Topic) would > give unique results since paths are the same ... huh ? > ... > ==> | > [Node[1103460]{id:18831,name:"Topic66"},:P_Topic_Link[68136903]{proximity:189},Node[1198508]{id:19594028,name:"Topic113"},:P_Topic_Link[68136874]{proximity:368},Node[1603710]{id:22939,name:"Topic111"}] > > > | 557 | > ==> | > [Node[1103460]{id:18831,name:"Topic66"},:P_Topic_Link[68136903]{proximity:189},Node[1198508]{id:19594028,name:"Topic113"},:P_Topic_Link[1113182]{proximity:368},Node[1603710]{id:22939,name:"Topic111"}] > > > | 557 | > > > > > So I have that **allShortestPath()** gives faster time and **almost** > wanted results **only** if previously searches were made (cached). May it > be true? > It d make sense partially: I expect graph algorithms faster than > retrieving paths, but a time for retriving 67 rows of general paths cannot > be that slow... (> 100 order of magnitude slower than allShortestPath() ?? ) > > Would it make sense if post a script in python to generate a random > structure similar to the one I have, post again the configurations files > used for my server and batch-importer, post the header I used for loading > the csv with the batch importer, and you could tell me if responsive time > is less 1s (production time) ? > you could try same tests and post results and a step by step guide ? > > > > > > Il giorno mercoledì 15 ottobre 2014 21:56:01 UTC+2, Michael Hunger ha > scritto: > > Can you just try this please? > > MATCH p = (n:Topic)-[*0..2]-(m:Topic) > where n.name = 'Topic1' and m.name = 'Topic2' > return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + > n.proximity) AS pathProximity > order by pathProximity DESC LIMIT 6; > > > > On Wed, Oct 15, 2014 at 2:52 PM, gg4u <[email protected]> wrote: > > Hi Michael, > > sorry I don't understand what it means. > Can I help you in helping me sorting out the issue somehow? :) > > What could I check or correct ? > What is a pattern matcher and can you teach in reading the profile for > making your conclusion? > Which may be possible reasons for selecting wrong pattern matcher, how to > correct it? > > thank you > > Il giorno mercoledì 15 ottobre 2014 14:04:57 UTC+2, Michael Hunger ha > scritto: > > Hi, > > from the profiling it seems that Cypher selects the wrong pattern matcher > if we separate the node-lookup and path-match. > > profile > MATCH p = (n:Topic)-[*0..2]-(m:Topic) > where n.name = 'Topic1' and m.name = 'Topic2' > return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + > n.proximity) AS pathProximity > order by pathProximity DESC LIMIT 6; > > > +------------------+------+--------+-------------+---------- > ---------------------------------------------------------+ > | Operator | Rows | DbHits | Identifiers | > Other | > +------------------+------+--------+-------------+---------- > ---------------------------------------------------------+ > | ColumnFilter | 0 | 0 | | > keep columns p, pathProximity | > | Top | 0 | 0 | | { > AUTOINT3}; Cached(pathProximity of type Any) | > | Extract | 0 | 0 | | > pathProximity | > | ExtractPath | 0 | 0 | p | > | > | Filter | 0 | 0 | | (hasLabel(m:Topic(0)) > AND Property(m,name(1)) == { AUTOSTRING1}) | > | TraversalMatcher | 0 | 1 | | > m, UNNAMED20, m | > +------------------+------+--------+-------------+---------- > ---------------------------------------------------------+ > > On Wed, Oct 15, 2014 at 11:00 AM, gg4u <[email protected]> wrote: > > Hi Micheal, > > your aggregation was only on the same paths, so you get 9 different paths > but you didn't show the counts per path. > > > not clear to me yet; I am gonna post results for each query you suggested > to try out. > > Rodger, to summarize a description of this test: > 4M nodes labeled 'Topic' > 100M rels (weighted) > Index on Topic(name) > 'is a string type property for each node' > 'Topic' dominates all dataset and this will be a subgraph of a larger > network (if we I can set this in production time, a next step will have a > graph of 85M nodes, ~2B rels, with same type of structure putting > properties as nodes' properties and not decoupling to other nodes). So this > is a primary, real case test, to see if it is feasible using Neo4j > datastructure Vs NoSQL. > And I'd love the answer be yes :D > > Micheal, here another test with other topics (I think not cached): > > MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name = ' > *Topic100*' and m.name = '*Topic2*' with p, n, m return p, count(*) order > by count(*); > > results: > ==> +----------------------------------------------------------- > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------------------------------------------------ > ---------------+ > ==> | p > > > | count(*) | > ==> +----------------------------------------------------------- > ---------------------------------------- > > ... -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
