Re: [Neo4j] Traversing Large (weighted) graphs: performance, data structure, indexes

gg4u Thu, 09 Oct 2014 00:42:08 -0700

Hi Micheal, thank you.
sure I post my profile result here below !


Il giorno sabato 4 ottobre 2014 00:25:48 UTC+2, Michael Hunger ha scritto:
>
> How many paths are returned from your query?
>

in this case, 9 paths
 

>
> MATCH p = (n)-[*0..2]-(m)
> where id(n) = 103105 and id(m) = 1386672
> return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + 
> n.proximity) AS pathProximity 
> order by pathProximity DESC;
>
> your index is on :Topic(name) ?
>


   - neo4j-sh (?)$ schema
   - ==> Indexes
   - ==>   ON :Topic(name) ONLINE  
   - ==>   ON :Topic(id)   ONLINE  
   - ==>   ON :topic(name) ONLINE  
   - ==>   ON :topic(id)   ONLINE  
   - ==> 
   - ==> No constraints

 

>
> MATCH p = (n:Topic)-[*0..2]-(m:Topic)
> where n.name = 'title-1' and m.name = 'title-2'
> return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + 
> n.proximity) AS pathProximity 
> order by pathProximity DESC;
>
> Can you profile your queries?
>
 

>
> go to: http://localhost:7474/webadmin/#/console/
>
> and enter:
>
> profile MATCH p = (n)-[*0..2]-(m)
> where id(n) = 103105 and id(m) = 1386672
> return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + 
> n.proximity) AS pathProximity 
> order by pathProximity DESC;
>



   - 
   ==> 
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   - ==> 9 rows
   - ==> 
   - ==> ColumnFilter
   - ==>   |
   - ==>   +Sort
   - ==>     |
   - ==>     +Extract
   - ==>       |
   - ==>       +ExtractPath
   - ==>         |
   - ==>         +PatternMatcher
   - ==>           |
   - ==>           +NodeByIdOrEmpty(0)
   - ==>             |
   - ==>             +NodeByIdOrEmpty(1)
   - ==> 
   - 
   ==> 
+--------------------+------+--------+-------------------+-----------------------------------+
   - 
   ==> |           Operator | Rows | DbHits |       Identifiers |               
              Other |
   - 
   ==> 
+--------------------+------+--------+-------------------+-----------------------------------+
   - 
   ==> |       ColumnFilter |    9 |      0 |                   |     keep 
columns p, pathProximity |
   - 
   ==> |               Sort |    9 |      0 |                   | 
Cached(pathProximity of type Any) |
   - 
   ==> |            Extract |    9 |     36 |                   |               
      pathProximity |
   - 
   ==> |        ExtractPath |    9 |      0 |                 p |               
                    |
   - 
   ==> |     PatternMatcher |    9 |      0 | n, m,   UNNAMED13 |               
                    |
   - 
   ==> | NodeByIdOrEmpty(0) |    1 |      1 |              m, m |               
       {  AUTOINT1} |
   - 
   ==> | NodeByIdOrEmpty(1) |    1 |      1 |              n, n |               
       {  AUTOINT0} |
   - 
   ==> 
+--------------------+------+--------+-------------------+-----------------------------------+
   - ==> 

 

>
> and
>
> profile MATCH p = (n:Topic)-[*0..2]-(m:Topic)
> where n.name = 'title-1' and m.name = 'title-2'
> return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + 
> n.proximity) AS pathProximity 
> order by pathProximity DESC;
>




   - 
   ==> 
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   - ==> 9 rows
   - ==> 
   - ==> ColumnFilter
   - ==>   |
   - ==>   +Sort
   - ==>     |
   - ==>     +Extract
   - ==>       |
   - ==>       +ExtractPath
   - ==>         |
   - ==>         +Filter
   - ==>           |
   - ==>           +TraversalMatcher
   - ==> 
   - 
   ==> 
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
   - 
   ==> |         Operator |    Rows |  DbHits | Identifiers |                   
                                          Other |
   - 
   ==> 
+------------------+---------+---------+-------------+-------------------------------------------------------------------+
   - 
   ==> |     ColumnFilter |       9 |       0 |             |                   
                  keep columns p, pathProximity |
   - 
   ==> |             Sort |       9 |       0 |             |                   
              Cached(pathProximity of type Any) |
   - 
   ==> |          Extract |       9 |      36 |             |                   
                                  pathProximity |
   - 
   ==> |      ExtractPath |       9 |       0 |           p |                   
                                                |
   - 
   ==> |           Filter |       9 | 3032385 |             | 
(hasLabel(m:Topic(0)) AND Property(m,name(1)) == {  AUTOSTRING1}) |
   - 
   ==> | TraversalMatcher | 1010795 | 1024307 |             |                   
                              m,   UNNAMED19, m |
   - 
   ==> 
+------------------+---------+---------+-------------+-------------------------------------------------------------------+

 

>
> and share the results
>

(
Also,if it may or not be useful, when I start the server I have the warning 
message:
./neo4j console
WARNING: Max 256 open files allowed, minimum of 40 000 recommended. See the 
Neo4j manual.

I thought it may have something to do with the search of files in the db 
... ?
)


Also a quick insight for improving the query:
the results contain duplicates (same path can occur more than once).
I didn't understand why: I thought it is because I used not-directed rels, 
but results are not consistent: some are duplicates, some are not.
Should use a collection function to avoid duplicates as a 'union' ? 
If I understood correctly the manual, concatenating queries (with, with) 
should not increase 'significantly' time for obtaining results, cause it is 
'interpreted' as a single transaction.
Am I right? Or is there maybe a more efficient query than the one I posted, 
to improve time-response?

thank you

 

>
> On Fri, Oct 3, 2014 at 11:43 PM, gg4u <[email protected] <javascript:>> 
> wrote:
>
>> Hi,
>>
>> here my new answer, I got into this issue:
>>
>> I have a large weighted graph with only one schema index on nodes (Topic):
>> 4M topics and 100M rels.
>>
>> I wanted to find paths between two given nodes.
>>
>> I tried out with queries like this one:
>> since it is a weighted graph, I compute the weighted path between nodes 
>> as the sum of its weight (I called weight 'proximity' here).
>>
>> Problem is, a query of this type, on such a large graph, tooks ages:
>>
>> Note that using an index, either directly the internal id, give same 
>> responsive results 
>> *Is there any way to speed up performance to reasonable production time?* 
>> (lower than 1s ... it means 3 orders of magnitude ... )
>>
>> MATCH (n) , (m), p = (n)-[*0..2]-(m)
>> where id(n) = 103105 and id(m) = 1386672
>> with p, n, m
>> return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + 
>> n.proximity) AS pathProximity order by pathProximity DESC;
>>
>> *~1M ms !!! *
>>
>>
>> same as
>> MATCH (n:Topic) , (m:Topic), p = (n)-[*0..2]-(m)
>> where n.name = 'title-1' and id(m) = 'title-2'
>> with p, n, m
>> return p, reduce(totProximity = 0, n IN relationships(p)| totProximity + 
>> n.proximity) AS pathProximity order by pathProximity DESC;
>>
>> *~2M ms !!! *
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Traversing Large (weighted) graphs: performance, data structure, indexes

Reply via email to