Sure, I tried three examples with (n), (n:Topic) and allShortestPath() and
also profiling them:
1.
*MATCH p = (n:Topic)-[*0..2]-(m:Topic) where n.name = 'Topic1' and
m.name = 'Topic2' return p, reduce(totProximity = 0, n IN
relationships(p)| totProximity + n.proximity) AS pathProximity order by
pathProximity DESC LIMIT 6;*
==> |
[Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[5662626]{proximity:47},Node[736816]{id:157427,name:"Topic3"},:P_Topic_Link[5662565]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}]
| 185
==> |
[Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[5662626]{proximity:47},Node[736816]{id:157427,name:"Topic3"},:P_Topic_Link[1025864]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}]
| 185 |
...
*==> 6 rows*
*==> 162423 ms*
*profile* MATCH p = (n:Topic)-[*0..2]-(m:Topic) where n.name = 'Topic1'
and m.name = 'Topic2' return p, reduce(totProximity = 0, n IN
relationships(p)| totProximity + n.proximity) AS pathProximity order by
pathProximity DESC LIMIT 6;
==> 6 rows
==>
==> ColumnFilter
==> |
==> +Top
==> |
==> +Extract
==> |
==> +ExtractPath
==> |
==> +Filter
==> |
==> +TraversalMatcher
==>
==>
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
==> | Operator | Rows | DbHits | Identifiers |
Other |
==>
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
==> | ColumnFilter | 6 | 0 | |
keep columns p, pathProximity |
==> | Top | 6 | 0 | |
{ AUTOINT3};* Cached(pathProximity of type Any) *|
==> | Extract | 9 | 36 | |
pathProximity |
==> | ExtractPath | 9 | 0 | p |
|
==> | Filter | 9 | 3032385 | |
(hasLabel(m:Topic(0)) AND Property(m,name(1)) == { AUTOSTRING1}) |
==> | TraversalMatcher | 1010795 | 1024307 | |
m, UNNAMED20, m |
==>
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
==>
MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where n.name =
'Topic1' and m.name = 'Topic2' with p, n, m return p, reduce(totProximity =
0, n IN relationships(p)| totProximity + n.proximity) AS pathProximity
order by pathProximity;
==> 9 rows
*==> 10111 ms*
==> 9 rows
==>
==> ColumnFilter
==> |
==> +Sort
==> |
==> +Extract
==> |
==> +ShortestPath
==> |
==> +SchemaIndex(0)
==> |
==> +SchemaIndex(1)
==>
==>
+----------------+------+--------+-------------+-----------------------------------+
==> | Operator | Rows | DbHits | Identifiers |
Other |
==>
+----------------+------+--------+-------------+-----------------------------------+
==> | ColumnFilter | 9 | 0 | | keep columns p,
pathProximity |
==> | Sort | 9 | 0 | |* Cached(pathProximity
of type Any)* |
==> | Extract | 9 | 36 | |
pathProximity |
==> | ShortestPath | 9 | 0 | p |
|
==> | SchemaIndex(0) | 1 | 2 | m, m | { AUTOSTRING1};
:Topic(name) |
==> | SchemaIndex(1) | 1 | 2 | n, n | { AUTOSTRING0};
:Topic(name) |
==>
+----------------+------+--------+-------------+-----------------------------------+
2.
MATCH p = (n:Topic)-[*0..2]-(m:Topic) where n.name = 'Topic44' and
m.name = 'Topic2' return p, reduce(totProximity = 0, n IN
relationships(p)| totProximity + n.proximity) AS pathProximity order by
pathProximity DESC LIMIT 6;
==> 6 rows
*==> 906108 ms*
==> 6 rows
==>
==> ColumnFilter
==> |
==> +Top
==> |
==> +Extract
==> |
==> +ExtractPath
==> |
==> +Filter
==> |
==> +TraversalMatcher
==>
==>
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
==> | Operator | Rows | DbHits | Identifiers |
Other |
==>
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
==> | ColumnFilter | 6 | 0 | |
keep columns p, pathProximity |
==> | Top | 6 | 0 | |
{ AUTOINT3}; Cached(pathProximity of type Any) |
==> | Extract | 67 | 268 | |
pathProximity |
==> | ExtractPath | 67 | 0 | p |
|
==> | Filter | 67 | 3246003 | |
(hasLabel(m:Topic(0)) AND Property(m,name(1)) == { AUTOSTRING1}) |
==> | TraversalMatcher | 1082001 | 1097166 | |
m, UNNAMED20, m |
==>
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where n.name =
'Topic44' and m.name = 'Topic2' with p, n, m return p, reduce(totProximity
= 0, n IN relationships(p)| totProximity + n.proximity) AS pathProximity
order by pathProximity;
magically and for first time:
*146ms*
so:
profile MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where
n.name = 'Topic44' and m.name = 'Topic2' with p, n, m return p,
reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity)
AS pathProximity order by pathProximity;
==> 67 rows
==>
==> ColumnFilter
==> |
==> +Sort
==> |
==> +Extract
==> |
==> +ShortestPath
==> |
==> +SchemaIndex(0)
==> |
==> +SchemaIndex(1)
==>
==>
+----------------+------+--------+-------------+-----------------------------------+
==> | Operator | Rows | DbHits | Identifiers |
Other |
==>
+----------------+------+--------+-------------+-----------------------------------+
==> | ColumnFilter | 67 | 0 | | keep columns p,
pathProximity |
==> | Sort | 67 | 0 | | Cached(pathProximity
of type Any) |
==> | Extract | 67 | 268 | |
pathProximity |
==> | ShortestPath | 67 | 0 | p |
|
==> | SchemaIndex(0) | 1 | 2 | m, m | { AUTOSTRING1};
:Topic(name) |
==> | SchemaIndex(1) | 1 | 2 | n, n | { AUTOSTRING0};
:Topic(name) |
==>
+----------------+------+--------+-------------+-----------------------------------+
==>
3.
So I tried:
MATCH p = *allShortestPaths*((n:Topic)-[*..2]-(m:Topic)) where n.name =
'Topic66' and m.name = 'Topic111' with p, n, m return p,
reduce(totProximity = 0, n IN relationships(p)| totProximity + n.proximity)
AS pathProximity order by pathProximity;
2 rows
34337 ms
and
MATCH p = (n:Topic)-[*..2]-(m:Topic) where n.name = 'Topic66' and m.name =
'Topic111' with p, n, m return p, reduce(totProximity = 0, n IN
relationships(p)| totProximity + n.proximity) AS pathProximity order by
pathProximity;
*2411 rows*
*3228423 ms !!*
Please also note that for each row there is a duplicate
(in my structure I do have (a:Topic)-[]->(b:Topic) and
(b:Topic)-[]->(a:Topic), but I thought that (a:Topic)-[]-(b:Topic) would
give unique results since paths are the same ... huh ?
...
==> |
[Node[1103460]{id:18831,name:"Topic66"},:P_Topic_Link[68136903]{proximity:189},Node[1198508]{id:19594028,name:"Topic113"},:P_Topic_Link[68136874]{proximity:368},Node[1603710]{id:22939,name:"Topic111"}]
| 557 |
==> |
[Node[1103460]{id:18831,name:"Topic66"},:P_Topic_Link[68136903]{proximity:189},Node[1198508]{id:19594028,name:"Topic113"},:P_Topic_Link[1113182]{proximity:368},Node[1603710]{id:22939,name:"Topic111"}]
| 557 |
So I have that **allShortestPath()** gives faster time and **almost**
wanted results **only** if previously searches were made (cached). May it
be true?
It d make sense partially: I expect graph algorithms faster than retrieving
paths, but a time for retriving 67 rows of general paths cannot be that
slow... (> 100 order of magnitude slower than allShortestPath() ?? )
Would it make sense if post a script in python to generate a random
structure similar to the one I have, post again the configurations files
used for my server and batch-importer, post the header I used for loading
the csv with the batch importer, and you could tell me if responsive time
is less 1s (production time) ?
you could try same tests and post results and a step by step guide ?
Il giorno mercoledì 15 ottobre 2014 21:56:01 UTC+2, Michael Hunger ha
scritto:
>
> Can you just try this please?
>
> MATCH p = (n:Topic)-[*0..2]-(m:Topic)
> where n.name = 'Topic1' and m.name = 'Topic2'
> return p, reduce(totProximity = 0, n IN relationships(p)| totProximity +
> n.proximity) AS pathProximity
> order by pathProximity DESC LIMIT 6;
>
>
>
> On Wed, Oct 15, 2014 at 2:52 PM, gg4u <[email protected] <javascript:>>
> wrote:
>
>> Hi Michael,
>>
>> sorry I don't understand what it means.
>> Can I help you in helping me sorting out the issue somehow? :)
>>
>> What could I check or correct ?
>> What is a pattern matcher and can you teach in reading the profile for
>> making your conclusion?
>> Which may be possible reasons for selecting wrong pattern matcher, how to
>> correct it?
>>
>> thank you
>>
>> Il giorno mercoledì 15 ottobre 2014 14:04:57 UTC+2, Michael Hunger ha
>> scritto:
>>>
>>> Hi,
>>>
>>> from the profiling it seems that Cypher selects the wrong pattern
>>> matcher if we separate the node-lookup and path-match.
>>>
>>> profile
>>> MATCH p = (n:Topic)-[*0..2]-(m:Topic)
>>> where n.name = 'Topic1' and m.name = 'Topic2'
>>> return p, reduce(totProximity = 0, n IN relationships(p)| totProximity
>>> + n.proximity) AS pathProximity
>>> order by pathProximity DESC LIMIT 6;
>>>
>>>
>>> +------------------+------+--------+-------------+----------
>>> ---------------------------------------------------------+
>>> | Operator | Rows | DbHits | Identifiers |
>>> Other |
>>> +------------------+------+--------+-------------+----------
>>> ---------------------------------------------------------+
>>> | ColumnFilter | 0 | 0 | |
>>> keep columns p, pathProximity |
>>> | Top | 0 | 0 | | {
>>> AUTOINT3}; Cached(pathProximity of type Any) |
>>> | Extract | 0 | 0 | |
>>> pathProximity |
>>> | ExtractPath | 0 | 0 | p |
>>> |
>>> | Filter | 0 | 0 | | (hasLabel(m:Topic(0))
>>> AND Property(m,name(1)) == { AUTOSTRING1}) |
>>> | TraversalMatcher | 0 | 1 | |
>>> m, UNNAMED20, m |
>>> +------------------+------+--------+-------------+----------
>>> ---------------------------------------------------------+
>>>
>>> On Wed, Oct 15, 2014 at 11:00 AM, gg4u <[email protected]> wrote:
>>>
>>>> Hi Micheal,
>>>>
>>>> your aggregation was only on the same paths, so you get 9 different
>>>>> paths but you didn't show the counts per path.
>>>>>
>>>>
>>>> not clear to me yet; I am gonna post results for each query you
>>>> suggested to try out.
>>>>
>>>> Rodger, to summarize a description of this test:
>>>> 4M nodes labeled 'Topic'
>>>> 100M rels (weighted)
>>>> Index on Topic(name) > 'is a string type property for each node'
>>>> 'Topic' dominates all dataset and this will be a subgraph of a larger
>>>> network (if we I can set this in production time, a next step will have a
>>>> graph of 85M nodes, ~2B rels, with same type of structure putting
>>>> properties as nodes' properties and not decoupling to other nodes). So
>>>> this
>>>> is a primary, real case test, to see if it is feasible using Neo4j
>>>> datastructure Vs NoSQL.
>>>> And I'd love the answer be yes :D
>>>>
>>>> Micheal, here another test with other topics (I think not cached):
>>>>
>>>> MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name = '
>>>> *Topic100*' and m.name = '*Topic2*' with p, n, m return p, count(*)
>>>> order by count(*);
>>>>
>>>> results:
>>>> ==> +-----------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ---------------+
>>>> ==> | p
>>>>
>>>>
>>>>
>>>>
>>>> | count(*) |
>>>> ==> +-----------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ---------------+
>>>> ==> | [Node[4114904]{id:7955,name:"Topic100"},:P_Topic_Link[
>>>> 10618620]{proximity:90},Node[3528892]{id:411782,name:"
>>>> Topic101"},:P_Topic_Link[1025954]{proximity:68},Node[
>>>> 1386672]{id:21245,name:"Topic2"}]
>>>> | 1 |
>>>> ==> | [Node[4114904]{id:7955,name:"Topic100"},:P_Topic_Link[
>>>> 2424845]{proximity:91},Node[3719110]{id:52502,name:"
>>>> Topic102"},:P_Topic_Link[1025923]{proximity:85},Node[
>>>> 1386672]{id:21245,name:"Topic2"}] | 1 |
>>>> ==> | [Node[4114904]{id:7955,name:"Topic100"},:P_Topic_Link[
>>>> 100682940]{proximity:19},Node[3461206]{id:39782569,name:"
>>>> Topic103"},:P_Topic_Link[100682931]{proximity:107},
>>>> Node[1386672]{id:21245,name:"Topic2"}] | 1 |
>>>> ==> | [Node[4114904]{id:7955,name:"Topic100"},:P_Topic_Link[
>>>> 21653222]{proximity:82},Node[706102]{id:1551073,name:"
>>>> Topic104"},:P_Topic_Link[21653218]{proximity:87},Node[
>>>> 1386672]{id:21245,name:"Topic2"}] | 1
>>>> |
>>>>
>>>> (.... results ...)
>>>>
>>>> ==> +-----------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ---------------+
>>>> ==> *67 rows*
>>>> ==>* 3900775 ms*
>>>>
>>>>
>>>>
>>>> Il giorno martedì 14 ottobre 2014 22:54:43 UTC+2, Michael Hunger ha
>>>> scritto:
>>>>>
>>>>> How many rows does this return?
>>>>>
>>>>> MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name =
>>>>> 'Topic1' and m.name = 'Topic2' with p, n, m return p, count(*) order
>>>>> by count(*);
>>>>>
>>>>> your aggregation was only on the same paths, so you get 9 different
>>>>> paths but you didn't show the counts per path.
>>>>>
>>>>
>>>>>
>>>>
>>>>> and obtain 9 rows in 182799 ms
>>>>>
>>>>> On Tue, Oct 14, 2014 at 10:59 AM, gg4u <[email protected]> wrote:
>>>>>
>>>>>> Yes:
>>>>>>
>>>>>> neo4j-sh (?)$ profile MATCH (n:Topic), (m:Topic) where n.name =
>>>>>> 'Topic1' and m.name = 'Topic2' MATCH p = (n)-[*0..2]-(m) return p,
>>>>>> reduce(totProximity = 0, n IN relationships(p)| totProximity +
>>>>>> n.proximity)
>>>>>> AS pathProximity order by pathProximity DESC LIMIT 6;
>>>>>> ==>
>>>>>> [...results...]
>>>>>> ==> 6 rows
>>>>>> ==>
>>>>>> ==> ColumnFilter
>>>>>> ==> |
>>>>>> ==> +Top
>>>>>> ==> |
>>>>>> ==> +Extract
>>>>>> ==> |
>>>>>> ==> +ExtractPath
>>>>>> ==> |
>>>>>> ==> +PatternMatcher
>>>>>> ==> |
>>>>>> ==> +SchemaIndex(0)
>>>>>> ==> |
>>>>>> ==> +SchemaIndex(1)
>>>>>> ==>
>>>>>> ==> +----------------+------+--------+-------------------+------
>>>>>> -------------------------------------------+
>>>>>> ==> | Operator | Rows | DbHits | Identifiers |
>>>>>> Other |
>>>>>> ==> +----------------+------+--------+-------------------+------
>>>>>> -------------------------------------------+
>>>>>> ==> | ColumnFilter | 6 | 0 | |
>>>>>> keep columns p, pathProximity |
>>>>>> ==> | Top | 6 | 0 | | {
>>>>>> AUTOINT3}; Cached(pathProximity of type Any) |
>>>>>> ==> | Extract | 9 | 36 | |
>>>>>> pathProximity |
>>>>>> ==> | ExtractPath | 9 | 0 | p |
>>>>>> |
>>>>>> ==> | PatternMatcher | 9 | 0 | n, m, UNNAMED94 |
>>>>>> |
>>>>>> ==> | SchemaIndex(0) | 1 | 2 | m, m |
>>>>>> { AUTOSTRING1}; :Topic(name) |
>>>>>> ==> | SchemaIndex(1) | 1 | 2 | n, n |
>>>>>> { AUTOSTRING0}; :Topic(name) |
>>>>>> ==> +----------------+------+--------+-------------------+------
>>>>>> -------------------------------------------+
>>>>>> ==>
>>>>>> neo4j-sh (?)$
>>>>>>
>>>>>>
>>>>>>
>>>>>> Il giorno martedì 14 ottobre 2014 10:00:29 UTC+2, Michael Hunger ha
>>>>>> scritto:
>>>>>>>
>>>>>>> Can you try this:
>>>>>>>
>>>>>>> profile
>>>>>>> MATCH (n:Topic), (m:Topic)
>>>>>>> where n.name = 'Topic1' and m.name = 'Topic2'
>>>>>>> MATCH p = (n)-[*0..2]-(m)
>>>>>>> return p, reduce(totProximity = 0, n IN relationships(p)|
>>>>>>> totProximity + n.proximity) AS pathProximity
>>>>>>> order by pathProximity DESC
>>>>>>> LIMIT 6
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 14, 2014 at 9:06 AM, gg4u <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Rodjer,
>>>>>>>>
>>>>>>>> thank you for your insights!
>>>>>>>> please see comments below:
>>>>>>>>
>>>>>>>> Il giorno lunedì 13 ottobre 2014 18:37:50 UTC+2, Rodger ha scritto:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I've done a lot of RDBMS performance tuning.
>>>>>>>>> Just a few quick thoughts.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Be sure to run the queries in the shell, if you are not already
>>>>>>>>> doing so.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yes, they are run in the shell:
>>>>>>>> http://localhost:7474/webadmin/#/console/
>>>>>>>>
>>>>>>>>
>>>>>>>>> How many rows are returned? Just sorting, then returning many
>>>>>>>>> rows,
>>>>>>>>> takes a long time to scroll them to output.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> 9 rows
>>>>>>>> In the answer above, I wrote 9 paths
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> If you are getting duplicates, it may be the equivalent of a
>>>>>>>>> cartesian product,
>>>>>>>>> one of the worst things that can happen in RDBMS, and also one
>>>>>>>>> of the least known. See my presentation on them here:
>>>>>>>>> http://rodgersnotes.wordpress.com/2010/09/15/stamping-out-ca
>>>>>>>>> rtesian-products/
>>>>>>>>> <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2010%2F09%2F15%2Fstamping-out-cartesian-products%2F&sa=D&sntz=1&usg=AFQjCNHJDOJ0IOsI6XRsg_9yuTscI4mqtQ>
>>>>>>>>>
>>>>>>>>
>>>>>>>> So I had a look at your pdf,
>>>>>>>> http://rodgersnotes.files.wordpress.com/2010/09/cartprodword
>>>>>>>> press.pdf
>>>>>>>> page 11
>>>>>>>>
>>>>>>>> and I think the idea you want to suggest, is to avoid duplicates
>>>>>>>> (you called them 'cartesian products') by enforcing conditions.
>>>>>>>> Though, since it is a graph db and not relational, not clear to me
>>>>>>>> where this applies because in the graph db I don't have 'jointed'
>>>>>>>> queries
>>>>>>>> between tables,
>>>>>>>> so the conditions I have are, at least in my case, properties
>>>>>>>> (index on properties), and no-directional rels.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Try:
>>>>>>>>>
>>>>>>>>> return p, count (*)
>>>>>>>>> order by count(*)
>>>>>>>>>
>>>>>>>>
>>>>>>>> I run:
>>>>>>>>
>>>>>>>> profile MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where
>>>>>>>> n.name = 'Topic1' and m.name = 'Topic2' with p, n, m return p,
>>>>>>>> count(*) order by count(*);
>>>>>>>>
>>>>>>>> and I've got: (see there are also duplicates in paths: is it
>>>>>>>> because I have both (a)-[]->(b) and (a)<-[]-(b) ?)
>>>>>>>>
>>>>>>>> ==> +-----------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------+
>>>>>>>> ==> | p
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> | count(*) |
>>>>>>>> ==> +-----------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------+
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[711852
>>>>>>>> 98]{proximity:68},Node[1401899]{id:21375850,name:"Topic3"},:
>>>>>>>> P_Topic_Link[71185313]{proximity:32},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>
>>>>>>>> | 1 |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[886757
>>>>>>>> 19]{proximity:28},Node[2594397]{id:31760062,name:"Topic4"},:
>>>>>>>> P_Topic_Link[88675745]{proximity:23},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>
>>>>>>>> | 1 |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[307360
>>>>>>>> 00]{proximity:32},Node[2515502]{id:3106745,name:"Topic5"},:P
>>>>>>>> _Topic_Link[30735974]{proximity:82},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>
>>>>>>>> | 1 |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[682063
>>>>>>>> 83]{proximity:72},Node[1202629]{id:19635605,name:"Topic6"},:
>>>>>>>> P_Topic_Link[68206440]{proximity:32},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>
>>>>>>>> | 1 |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[988981
>>>>>>>> 73]{proximity:23},Node[3329750]{id:38567205,name:"Topic7"},:
>>>>>>>> P_Topic_Link[98898126]{proximity:124},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>
>>>>>>>> | 1 |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[581077
>>>>>>>> 55]{proximity:55},Node[506613]{id:13841207,name:"Topic8"},:P
>>>>>>>> _Topic_Link[58107766]{proximity:27},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>
>>>>>>>> | 1 |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[988981
>>>>>>>> 73]{proximity:23},Node[3329750]{id:38567205,name:"Topic7"},:
>>>>>>>> P_Topic_Link[1025873]{proximity:124},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>
>>>>>>>> | 1 |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[566262
>>>>>>>> 6]{proximity:47},Node[736816]{id:157427,name:"Topic9"},:P_To
>>>>>>>> pic_Link[5662565]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>
>>>>>>>> | 1 |
>>>>>>>> ==> | [Node[103105]{id:1092923,name:"Topic1"},:P_Topic_Link[566262
>>>>>>>> 6]{proximity:47},Node[736816]{id:157427,name:"Topic9"},:P_To
>>>>>>>> pic_Link[1025864]{proximity:138},Node[1386672]{id:21245,name:"Topic2"}]
>>>>>>>>
>>>>>>>> | 1 |
>>>>>>>> ==> +-----------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------+
>>>>>>>> ==> 9 rows
>>>>>>>> ==>
>>>>>>>> ==> ColumnFilter(0)
>>>>>>>> ==> |
>>>>>>>> ==> +Sort
>>>>>>>> ==> |
>>>>>>>> ==> +EagerAggregation
>>>>>>>> ==> |
>>>>>>>> ==> +ColumnFilter(1)
>>>>>>>> ==> |
>>>>>>>> ==> +ExtractPath
>>>>>>>> ==> |
>>>>>>>> ==> +Filter
>>>>>>>> ==> |
>>>>>>>> ==> +TraversalMatcher
>>>>>>>> ==>
>>>>>>>> ==> +------------------+---------+---------+-------------+------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ----------------+
>>>>>>>> ==> | Operator | Rows | DbHits | Identifiers |
>>>>>>>>
>>>>>>>> Other |
>>>>>>>> ==> +------------------+---------+---------+-------------+------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ----------------+
>>>>>>>> ==> | ColumnFilter(0) | 9 | 0 | |
>>>>>>>> keep columns p,
>>>>>>>> count(*) |
>>>>>>>> ==> | Sort | 9 | 0 | | Cached(
>>>>>>>> INTERNAL_AGGREGATE931614f3-4def-4fc4-a80b-c6fca3839817 of type
>>>>>>>> Integer) |
>>>>>>>> ==> | EagerAggregation | 9 | 0 | |
>>>>>>>>
>>>>>>>> p |
>>>>>>>> ==> | ColumnFilter(1) | 9 | 0 | |
>>>>>>>> keep columns p, n,
>>>>>>>> m |
>>>>>>>> ==> | ExtractPath | 9 | 0 | p |
>>>>>>>>
>>>>>>>> |
>>>>>>>> ==> | Filter | 9 | 3032385 | |
>>>>>>>> (hasLabel(m:Topic(0)) AND Property(m,name(1)) == {
>>>>>>>> AUTOSTRING1}) |
>>>>>>>> ==> | TraversalMatcher | 1010795 | 1024307 | |
>>>>>>>> m, UNNAMED36,
>>>>>>>> m |
>>>>>>>> ==> +------------------+---------+---------+-------------+------
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ----------------+
>>>>>>>> ==>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Without me looking at the raw data, and the query result, you
>>>>>>>>> seem to have many operations going on. So, you have a lot of rows
>>>>>>>>> in
>>>>>>>>> the profile output.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Only 9
>>>>>>>>
>>>>>>>>
>>>>>>>>> As a general rule, the more rows there are in the
>>>>>>>>> profile, the slower the response time is.
>>>>>>>>> ie. the more complex the query, the slower it is.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If I were looking at this, I would try to isolate which part of
>>>>>>>>> the query is the slow part. The Return clause, or the Match
>>>>>>>>> clause?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You've already tried the response times with the data.
>>>>>>>>> Try to simply:
>>>>>>>>> return count(*) .
>>>>>>>>>
>>>>>>>>
>>>>>>>> I run:
>>>>>>>> MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m) where n.name =
>>>>>>>> 'Topic1' and m.name = 'Topic2' with p, n, m return p, count(*)
>>>>>>>> order by count(*);
>>>>>>>>
>>>>>>>> and obtain 9 rows in 182799 ms
>>>>>>>>
>>>>>>>> I run:
>>>>>>>> MATCH (n:Topic), (m:Topic) where n.name = 'Topic1' and m.name =
>>>>>>>> 'Topic2' with n, m return count(*);
>>>>>>>>
>>>>>>>> and obtain 856ms
>>>>>>>>
>>>>>>>>
>>>>>>>> profile MATCH (n:Topic), (m:Topic) where n.name = 'Topic1' and
>>>>>>>> m.name = 'Topic2' with n, m return count(*);
>>>>>>>>
>>>>>>>> results in:
>>>>>>>>
>>>>>>>>
>>>>>>>> ==> ColumnFilter
>>>>>>>> ==> |
>>>>>>>> ==> +EagerAggregation
>>>>>>>> ==> |
>>>>>>>> ==> +SchemaIndex(0)
>>>>>>>> ==> |
>>>>>>>> ==> +SchemaIndex(1)
>>>>>>>> ==>
>>>>>>>> ==> +------------------+------+--------+-------------+----------
>>>>>>>> ---------------------+
>>>>>>>> ==> | Operator | Rows | DbHits | Identifiers |
>>>>>>>> Other |
>>>>>>>> ==> +------------------+------+--------+-------------+----------
>>>>>>>> ---------------------+
>>>>>>>> ==> | ColumnFilter | 1 | 0 | | keep
>>>>>>>> columns count(*) |
>>>>>>>> ==> | EagerAggregation | 1 | 0 | |
>>>>>>>> |
>>>>>>>> ==> | SchemaIndex(0) | 1 | 2 | m, m | {
>>>>>>>> AUTOSTRING1}; :Topic(name) |
>>>>>>>> ==> | SchemaIndex(1) | 1 | 2 | n, n | {
>>>>>>>> AUTOSTRING0}; :Topic(name) |
>>>>>>>> ==> +------------------+------+--------+-------------+----------
>>>>>>>> ---------------------+
>>>>>>>>
>>>>>>>>
>>>>>>>>> How many seconds response time is that, versus the original query?
>>>>>>>>> What is the resulting profile?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> So, it looks like it actually take huge time in traversing the
>>>>>>>> graph,
>>>>>>>> while reasonable time '~900ms' to match a fullstring node.
>>>>>>>>
>>>>>>>> *Any idea for improving performance of traversal??*
>>>>>>>>
>>>>>>>> *It is a real problem, since also for getting results of first
>>>>>>>> neighbors of a node, I met the same problem which makes currently
>>>>>>>> unfeasible for production :*
>>>>>>>> *Anyone with real case of similar size graph and structure trying
>>>>>>>> to perform a similar query?*
>>>>>>>>
>>>>>>>> as example, this query to obtain first neighbors of node Topic44:
>>>>>>>>
>>>>>>>> MATCH (n:Topic) , (m), p = (n)-[*0..1]-(m)
>>>>>>>> where n.name = 'Topic44'
>>>>>>>> with p, n, m
>>>>>>>> return p, reduce(totProximity = 0, n IN relationships(p)|
>>>>>>>> totProximity + n.proximity) AS pathProximity order by pathProximity
>>>>>>>> DESC
>>>>>>>> LIMIT 6
>>>>>>>>
>>>>>>>> returns
>>>>>>>> 6 rows in ~65000 ms VS 6 rows in less than a second with a NoSQL.
>>>>>>>>
>>>>>>>> Any idea?
>>>>>>>>
>>>>>>>> thank you guys for helping!! Hope to find a solution soon..
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> See also the tuning presentations I've done:
>>>>>>>>> http://rodgersnotes.wordpress.com/2010/09/14/oracle-performa
>>>>>>>>> nce-tuning/
>>>>>>>>> <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2010%2F09%2F14%2Foracle-performance-tuning%2F&sa=D&sntz=1&usg=AFQjCNE0XK_XcNk5YBj806h6a1OJHr0glA>
>>>>>>>>> http://rodgersnotes.wordpress.com/2014/06/08/tuning-the-untu
>>>>>>>>> nable-when-indexes-and-optimizer-dont-help-2/
>>>>>>>>> <http://www.google.com/url?q=http%3A%2F%2Frodgersnotes.wordpress.com%2F2014%2F06%2F08%2Ftuning-the-untunable-when-indexes-and-optimizer-dont-help-2%2F&sa=D&sntz=1&usg=AFQjCNFgTfu5bnjPw6boHWttJpzQBtaNgw>
>>>>>>>>> They are quick reads.
>>>>>>>>>
>>>>>>>>> thank you, seen them,
>>>>>>>> they are about SQL tuning mostly:
>>>>>>>> I've just used neo4j strucutre to store a graph with same label on
>>>>>>>> 4M topics (I MUST keep it with one label), index on topic(name)
>>>>>>>> property
>>>>>>>> and used cypher to query the db,
>>>>>>>> this is my data structure.
>>>>>>>>
>>>>>>>> I've put a number of principles and principles in there, that you
>>>>>>>>> might apply.
>>>>>>>>> ie. Could you create the NEO4J equivalent of a temp table?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hope this helps.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thursday, October 9, 2014 2:41:47 AM UTC-5, gg4u wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Micheal, thank you.
>>>>>>>>>> sure I post my profile result here below !
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "Neo4j" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Neo4j" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.