[Neo4j] Re: On relative performance of native querying in Java vs. Cypher querying.

costas . barbis Thu, 20 Mar 2014 03:37:34 -0700

Dear Michael,

Thank you for your detailed response and the multiple clarifications.


I was considering some of these concerns (but did not want to add an even 
longer amount of tests) but there were some (like the limit expression) I 
was not aware of at all.

It has been very helpful in furthering my understanding of Neo.

Cheers,


On Monday, March 17, 2014 4:05:41 PM UTC, [email protected] wrote:

> *Author's note:* Even though this post seems to be partially delving into 
> the technical characteristics of Cypher it has been created as an initiator 
> for discussion on relative scalability and performance of native vs. cypher 
> so I believed it should be placed in the group forum as opposed to stack 
> overflow. If the coordinators disagree on this decision please move it to 
> SO. Thank you. 
> ** neo4j version, library versions, OS, jdk:*
>
> Neo4j Stable Release 2.0.1 [community edition], with relevant libraries.
> OS: windows 7 (64 bit), service pack 1, 8gb ram, i5 quad core desktop 
> (without hyper-threading).
> jdk: java 7 (jdk1.7.0_51).
> all applications are in java, using embedded database instance (java 
> runtime and neo4j vm/other arguments are default (empty) in each case).
>
> Fellow Neo4J enthusiasts,
> After performing various simple benchmarks it would appear that using 
> native java to query a neo graph outperforms doing the same using cypher. 
> This would normally be surprising in a relational context but i can see 
> how it can make sense in this graph context (as virtual memory is heavily 
> used etc.). 
> As such I would like to know if this is in fact the case for the majority 
> of queries that can be made in neo4j, or whether I am missing something 
> glaringly obvious in my tests.
>  
> *A summary of my test parameters is as follows:*
>
>    - a graph of 100k nodes (each having one property) is used as a 
>    baseline (relationships and more properties are added for later 
>    experiments).
>    - after graph insertion the jvm is restarted before querying in order 
>    to clear virtual memory.
>    - the same query is performed using cypher and neo4j in each case.
>    - the jvm is restarted for every subsequent query experiment to ensure 
>    virtual memory is wiped (vm is embedded into java heap in windows by 
>    default).
>    - during each experiment 10 repetitions of the query are performed, 
>    with each repetition having a small change in one of the query variables. 
>    This is to test execution time for the first instance (hence before any 
>    nodes are in virtual memory) as well as subsequent calls where most of the 
>    nodes queried are in RAM.
>    - execution time is recorded using System.nanotime() for accuracy.
>    - the cypher queries have been optimised as much as possible (with my 
>    current knowledge) using a single execution engine per experiment and the 
>    changing variable is injected as a parameter to the query.
>
> *Hypothesis:*
> For all experiments (details below) both for initial query execution as 
> well as subsequent executions, the native java api outperforms cypher.
>
> *The following queries were performed (code presented below, this is a 
> summary in text for conciseness):*
>
>    - get me all nodes which have property 'i' equal to {max}. NB: The 
>    variable {max} ranges from 0 to 100k in the nodes and the queries test all 
>    possible ranges to ensure that the java execution does not have a 
>    preferential treatment by finding results close to the top of the node 
> file 
>    (as is shown in the results).
>    - get me all nodes which have property 'i' equal to {max}, through a 
>    single indexed node. In my context (model-driven engineering research) it 
>    is common to have a single (or very few) starting points for a query, so 
> my 
>    second test simulated this behaviour by creating a single "source" node 
>    with relationship to the 100k nodes to be queried. as such, the query goes 
>    to the lucene index to find the node and then traverses the relationship 
> it 
>    has to the 100k nodes in order to be executed. This also avoids using the 
>    GlobalGraphOperations.at(database).getAllNodes() operation in java (which 
>    is useful as it would never be used in my context).
>    - get me all nodes which have property 'i' equal to {max} and 
>    relationship named (of relationship type with name) {name} to another 
> node. 
>    this is a simple extension to the first query which uses a one-hop 
>    traversal as well.
>    - get me all nodes which have property 'i' equal to {max} and 'i2' 
>    equal to {max2}.
>    - get me all nodes which have property 'i' equal to {max} and 'i2' 
>    equal to {max2} and relationship named {name} to another node. 
>
> *Results:*
>
> I will only present the detailed results of the first query as it would 
> get tediously long to present them all (they are all included as a snippet 
> link below), and as mentioned above, all of them seem to support the 
> statement that java 
>
> outperforms cypher.
>
> *Query 1 results:*
> Java (microseconds): 15415  result 'i': 1  testing equality on: 1
> Java (microseconds): 288  result 'i': 2  testing equality on: 2
> Java (microseconds): 333  result 'i': 3  testing equality on: 3
> Java (microseconds): 304  result 'i': 4  testing equality on: 4
> Java (microseconds): 303  result 'i': 5  testing equality on: 5
> Java (microseconds): 319  result 'i': 6  testing equality on: 6
> Java (microseconds): 331  result 'i': 7  testing equality on: 7
> Java (microseconds): 355  result 'i': 8  testing equality on: 8
> Java (microseconds): 368  result 'i': 9  testing equality on: 9
> Java (microseconds): 385  result 'i': 10  testing equality on: 10
>
> Java (microseconds): 175027  result 'i': 1000  testing equality on: 1000
> Java (microseconds): 146228  result 'i': 2000  testing equality on: 2000
> Java (microseconds): 126249  result 'i': 3000  testing equality on: 3000
> Java (microseconds): 98282  result 'i': 4000  testing equality on: 4000
> Java (microseconds): 69881  result 'i': 5000  testing equality on: 5000
> Java (microseconds): 38536  result 'i': 6000  testing equality on: 6000
> Java (microseconds): 24090  result 'i': 7000  testing equality on: 7000
> Java (microseconds): 25140  result 'i': 8000  testing equality on: 8000
> Java (microseconds): 25849  result 'i': 9000  testing equality on: 9000
> Java (microseconds): 26664  result 'i': 10000  testing equality on: 10000
>
> Java (microseconds): 1704711  result 'i': 99997  testing equality on: 99997
> Java (microseconds): 119149  result 'i': 99998  testing equality on: 99998
> Java (microseconds): 49827  result 'i': 99999  testing equality on: 99999
> Java (microseconds): 60392  result 'i': -1   testing equality on: 100000
> Java (microseconds): 42451  result 'i': -1   testing equality on: 100001
> Java (microseconds): 35205  result 'i': -1   testing equality on: 100002
> Java (microseconds): 36279  result 'i': -1   testing equality on: 100003
> Java (microseconds): 34999  result 'i': -1   testing equality on: 100004
> Java (microseconds): 35179  result 'i': -1   testing equality on: 100005
> Java (microseconds): 45571  result 'i': -1   testing equality on: 100006
>
> Cypher [prepared with 1 execution engine] (microseconds): 2688552  result 
> 'i': 100  testing equality on: 100
> Cypher [prepared with 1 execution engine] (microseconds): 134839  result 
> 'i': 200  testing equality on: 200
> Cypher [prepared with 1 execution engine] (microseconds): 116128  result 
> 'i': 300  testing equality on: 300
> Cypher [prepared with 1 execution engine] (microseconds): 96070   result 
> 'i': 400  testing equality on: 400
> Cypher [prepared with 1 execution engine] (microseconds): 111627  result 
> 'i': 500  testing equality on: 500
> Cypher [prepared with 1 execution engine] (microseconds): 116955  result 
> 'i': 600  testing equality on: 600
> Cypher [prepared with 1 execution engine] (microseconds): 98720   result 
> 'i': 700  testing equality on: 700
> Cypher [prepared with 1 execution engine] (microseconds): 96051   result 
> 'i': 800  testing equality on: 800
> Cypher [prepared with 1 execution engine] (microseconds): 106406  result 
> 'i': 900  testing equality on: 900
> Cypher [prepared with 1 execution engine] (microseconds): 97068   result 
> 'i': 1000  testing equality on: 1000
>
> Cypher [prepared with 1 execution engine] (microseconds): 2651371  result 
> 'i': 1000  testing equality on: 1000
> Cypher [prepared with 1 execution engine] (microseconds): 121623  result 
> 'i': 2000  testing equality on: 2000
> Cypher [prepared with 1 execution engine] (microseconds): 95211   result 
> 'i': 3000  testing equality on: 3000
> Cypher [prepared with 1 execution engine] (microseconds): 79345   result 
> 'i': 4000  testing equality on: 4000
> Cypher [prepared with 1 execution engine] (microseconds): 88915   result 
> 'i': 5000  testing equality on: 5000
> Cypher [prepared with 1 execution engine] (microseconds): 100527  result 
> 'i': 6000  testing equality on: 6000
> Cypher [prepared with 1 execution engine] (microseconds): 77890   result 
> 'i': 7000  testing equality on: 7000
> Cypher [prepared with 1 execution engine] (microseconds): 77430   result 
> 'i': 8000  testing equality on: 8000
> Cypher [prepared with 1 execution engine] (microseconds): 76451   result 
> 'i': 9000  testing equality on: 9000
> Cypher [prepared with 1 execution engine] (microseconds): 86732   result 
> 'i': 10000  testing equality on: 10000
>
> As we can clearly see, Java can "cheat" on low equality tests (as we break 
> after finding the node as we assume (and know in our context) 'i' is 
> unique) but more interestingly even when it fails (checking i > 100k) it is 
> still roughly 2 
>
> times as fast as cypher both for initial queries and subsequent ones (this 
> is a good test for non-unique properties too as it forces java to iterate 
> through all of the nodes present).
> This shows two things in my view:
> 1) Java can optimise for unique results. As far as I am aware cypher 
> cannot be told to stop when it finds a result we know is unique (such as an 
> ISBN of a book for example or any other unique property in a node).
> 2) For non-unique results (or for a failed query) it is still faster than 
> cypher.
>
> After getting these results my curiosity prompted me to expand the scope 
> by adding relationships and a second attribute to see if the same trend 
> continues, and it did.
>
> *Links to code snippets:*
> first query:
> https://gist.github.com/anonymous/9601553
>
> second query:
> https://gist.github.com/anonymous/9601556
>
> third query:
> https://gist.github.com/anonymous/9601571
>
> fourth:
> https://gist.github.com/anonymous/9601581
>
> fifth:
> https://gist.github.com/anonymous/9601590
>
> entire result set link:
> https://gist.github.com/anonymous/9601486
>
> *Discussion:*
>
> My main question is whether these results are to be expected (or even 
> obvious to some) which would mean I will just use native java in my 
> application.
> If my results are wrong/misleading I would appreciate knowing why but if 
> not, a discussion on how to improve cypher to attempt to close the gap may 
> be useful.
>
> Other notes and observations I had (non-conclusive as the tests performed 
> were not as thorough as the above):
> using 'count' in cypher seems to destroy execution time (whereas normally 
> in sql it improves it).
> adding depth to a cypher search (for example going from (a)-[]->(b) to 
> (a)-[]->(b)-[]->(c)) seems to scale a lot worst in cypher than java.
>
> Thank you for reading this wall,
>
> Costas
>
>  
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[Neo4j] Re: On relative performance of native querying in Java vs. Cypher querying.

Reply via email to