Re: [Neo4j] On relative performance of native querying in Java vs. Cypher querying.

Michael Hunger Sun, 23 Mar 2014 04:24:14 -0700

Happy I could help.

Please consider blogging about it after you worked out your solution. And don't 
hesitate to come back with other questions.


Cheers,

Michael

Am 20.03.2014 um 11:36 schrieb [email protected]:

> Dear Michael,
> 
> Thank you for your detailed response and the multiple clarifications.
> 
> I was considering some of these concerns (but did not want to add an even 
> longer amount of tests) but there were some (like the limit expression) I was 
> not aware of at all.
> 
> It has been very helpful in furthering my understanding of Neo.
> 
> Cheers,
> 
> 
> On Monday, March 17, 2014 4:05:41 PM UTC, [email protected] wrote:
> Author's note: Even though this post seems to be partially delving into the 
> technical characteristics of Cypher it has been created as an initiator for 
> discussion on relative scalability and performance of native vs. cypher so I 
> believed it should be placed in the group forum as opposed to stack overflow. 
> If the coordinators disagree on this decision please move it to SO. Thank you.
> 
> * neo4j version, library versions, OS, jdk:
> 
> Neo4j Stable Release 2.0.1 [community edition], with relevant libraries.
> OS: windows 7 (64 bit), service pack 1, 8gb ram, i5 quad core desktop 
> (without hyper-threading).
> jdk: java 7 (jdk1.7.0_51).
> all applications are in java, using embedded database instance (java runtime 
> and neo4j vm/other arguments are default (empty) in each case).
> 
> Fellow Neo4J enthusiasts,
> After performing various simple benchmarks it would appear that using native 
> java to query a neo graph outperforms doing the same using cypher.
> This would normally be surprising in a relational context but i can see how 
> it can make sense in this graph context (as virtual memory is heavily used 
> etc.).
> As such I would like to know if this is in fact the case for the majority of 
> queries that can be made in neo4j, or whether I am missing something 
> glaringly obvious in my tests.
>  
> A summary of my test parameters is as follows:
> a graph of 100k nodes (each having one property) is used as a baseline 
> (relationships and more properties are added for later experiments).
> after graph insertion the jvm is restarted before querying in order to clear 
> virtual memory.
> the same query is performed using cypher and neo4j in each case.
> the jvm is restarted for every subsequent query experiment to ensure virtual 
> memory is wiped (vm is embedded into java heap in windows by default).
> during each experiment 10 repetitions of the query are performed, with each 
> repetition having a small change in one of the query variables. This is to 
> test execution time for the first instance (hence before any nodes are in 
> virtual memory) as well as subsequent calls where most of the nodes queried 
> are in RAM.
> execution time is recorded using System.nanotime() for accuracy.
> the cypher queries have been optimised as much as possible (with my current 
> knowledge) using a single execution engine per experiment and the changing 
> variable is injected as a parameter to the query.
> Hypothesis:
> 
> For all experiments (details below) both for initial query execution as well 
> as subsequent executions, the native java api outperforms cypher.
> 
> The following queries were performed (code presented below, this is a summary 
> in text for conciseness):
> get me all nodes which have property 'i' equal to {max}. NB: The variable 
> {max} ranges from 0 to 100k in the nodes and the queries test all possible 
> ranges to ensure that the java execution does not have a preferential 
> treatment by finding results close to the top of the node file (as is shown 
> in the results).
> get me all nodes which have property 'i' equal to {max}, through a single 
> indexed node. In my context (model-driven engineering research) it is common 
> to have a single (or very few) starting points for a query, so my second test 
> simulated this behaviour by creating a single "source" node with relationship 
> to the 100k nodes to be queried. as such, the query goes to the lucene index 
> to find the node and then traverses the relationship it has to the 100k nodes 
> in order to be executed. This also avoids using the 
> GlobalGraphOperations.at(database).getAllNodes() operation in java (which is 
> useful as it would never be used in my context).
> get me all nodes which have property 'i' equal to {max} and relationship 
> named (of relationship type with name) {name} to another node. this is a 
> simple extension to the first query which uses a one-hop traversal as well.
> get me all nodes which have property 'i' equal to {max} and 'i2' equal to 
> {max2}.
> get me all nodes which have property 'i' equal to {max} and 'i2' equal to 
> {max2} and relationship named {name} to another node. 
> Results:
> 
> I will only present the detailed results of the first query as it would get 
> tediously long to present them all (they are all included as a snippet link 
> below), and as mentioned above, all of them seem to support the statement 
> that java
> outperforms cypher.
> 
> Query 1 results:
> 
> Java (microseconds): 15415  result 'i': 1  testing equality on: 1
> Java (microseconds): 288  result 'i': 2  testing equality on: 2
> Java (microseconds): 333  result 'i': 3  testing equality on: 3
> Java (microseconds): 304  result 'i': 4  testing equality on: 4
> Java (microseconds): 303  result 'i': 5  testing equality on: 5
> Java (microseconds): 319  result 'i': 6  testing equality on: 6
> Java (microseconds): 331  result 'i': 7  testing equality on: 7
> Java (microseconds): 355  result 'i': 8  testing equality on: 8
> Java (microseconds): 368  result 'i': 9  testing equality on: 9
> Java (microseconds): 385  result 'i': 10  testing equality on: 10
> 
> Java (microseconds): 175027  result 'i': 1000  testing equality on: 1000
> Java (microseconds): 146228  result 'i': 2000  testing equality on: 2000
> Java (microseconds): 126249  result 'i': 3000  testing equality on: 3000
> Java (microseconds): 98282  result 'i': 4000  testing equality on: 4000
> Java (microseconds): 69881  result 'i': 5000  testing equality on: 5000
> Java (microseconds): 38536  result 'i': 6000  testing equality on: 6000
> Java (microseconds): 24090  result 'i': 7000  testing equality on: 7000
> Java (microseconds): 25140  result 'i': 8000  testing equality on: 8000
> Java (microseconds): 25849  result 'i': 9000  testing equality on: 9000
> Java (microseconds): 26664  result 'i': 10000  testing equality on: 10000
> 
> Java (microseconds): 1704711  result 'i': 99997  testing equality on: 99997
> Java (microseconds): 119149  result 'i': 99998  testing equality on: 99998
> Java (microseconds): 49827  result 'i': 99999  testing equality on: 99999
> Java (microseconds): 60392  result 'i': -1   testing equality on: 100000
> Java (microseconds): 42451  result 'i': -1   testing equality on: 100001
> Java (microseconds): 35205  result 'i': -1   testing equality on: 100002
> Java (microseconds): 36279  result 'i': -1   testing equality on: 100003
> Java (microseconds): 34999  result 'i': -1   testing equality on: 100004
> Java (microseconds): 35179  result 'i': -1   testing equality on: 100005
> Java (microseconds): 45571  result 'i': -1   testing equality on: 100006
> 
> Cypher [prepared with 1 execution engine] (microseconds): 2688552  result 
> 'i': 100  testing equality on: 100
> Cypher [prepared with 1 execution engine] (microseconds): 134839  result 'i': 
> 200  testing equality on: 200
> Cypher [prepared with 1 execution engine] (microseconds): 116128  result 'i': 
> 300  testing equality on: 300
> Cypher [prepared with 1 execution engine] (microseconds): 96070   result 'i': 
> 400  testing equality on: 400
> Cypher [prepared with 1 execution engine] (microseconds): 111627  result 'i': 
> 500  testing equality on: 500
> Cypher [prepared with 1 execution engine] (microseconds): 116955  result 'i': 
> 600  testing equality on: 600
> Cypher [prepared with 1 execution engine] (microseconds): 98720   result 'i': 
> 700  testing equality on: 700
> Cypher [prepared with 1 execution engine] (microseconds): 96051   result 'i': 
> 800  testing equality on: 800
> Cypher [prepared with 1 execution engine] (microseconds): 106406  result 'i': 
> 900  testing equality on: 900
> Cypher [prepared with 1 execution engine] (microseconds): 97068   result 'i': 
> 1000  testing equality on: 1000
> 
> Cypher [prepared with 1 execution engine] (microseconds): 2651371  result 
> 'i': 1000  testing equality on: 1000
> Cypher [prepared with 1 execution engine] (microseconds): 121623  result 'i': 
> 2000  testing equality on: 2000
> Cypher [prepared with 1 execution engine] (microseconds): 95211   result 'i': 
> 3000  testing equality on: 3000
> Cypher [prepared with 1 execution engine] (microseconds): 79345   result 'i': 
> 4000  testing equality on: 4000
> Cypher [prepared with 1 execution engine] (microseconds): 88915   result 'i': 
> 5000  testing equality on: 5000
> Cypher [prepared with 1 execution engine] (microseconds): 100527  result 'i': 
> 6000  testing equality on: 6000
> Cypher [prepared with 1 execution engine] (microseconds): 77890   result 'i': 
> 7000  testing equality on: 7000
> Cypher [prepared with 1 execution engine] (microseconds): 77430   result 'i': 
> 8000  testing equality on: 8000
> Cypher [prepared with 1 execution engine] (microseconds): 76451   result 'i': 
> 9000  testing equality on: 9000
> Cypher [prepared with 1 execution engine] (microseconds): 86732   result 'i': 
> 10000  testing equality on: 10000
> 
> As we can clearly see, Java can "cheat" on low equality tests (as we break 
> after finding the node as we assume (and know in our context) 'i' is unique) 
> but more interestingly even when it fails (checking i > 100k) it is still 
> roughly 2
> times as fast as cypher both for initial queries and subsequent ones (this is 
> a good test for non-unique properties too as it forces java to iterate 
> through all of the nodes present).
> 
> This shows two things in my view:
> 1) Java can optimise for unique results. As far as I am aware cypher cannot 
> be told to stop when it finds a result we know is unique (such as an ISBN of 
> a book for example or any other unique property in a node).
> 2) For non-unique results (or for a failed query) it is still faster than 
> cypher.
> 
> After getting these results my curiosity prompted me to expand the scope by 
> adding relationships and a second attribute to see if the same trend 
> continues, and it did.
> 
> Links to code snippets:
> 
> first query:
> https://gist.github.com/anonymous/9601553
> 
> second query:
> https://gist.github.com/anonymous/9601556
> 
> third query:
> https://gist.github.com/anonymous/9601571
> 
> fourth:
> https://gist.github.com/anonymous/9601581
> 
> fifth:
> https://gist.github.com/anonymous/9601590
> 
> entire result set link:
> https://gist.github.com/anonymous/9601486
> 
> Discussion:
> 
> My main question is whether these results are to be expected (or even obvious 
> to some) which would mean I will just use native java in my application.
> If my results are wrong/misleading I would appreciate knowing why but if not, 
> a discussion on how to improve cypher to attempt to close the gap may be 
> useful.
> 
> Other notes and observations I had (non-conclusive as the tests performed 
> were not as thorough as the above):
> using 'count' in cypher seems to destroy execution time (whereas normally in 
> sql it improves it).
> adding depth to a cypher search (for example going from (a)-[]->(b) to 
> (a)-[]->(b)-[]->(c)) seems to scale a lot worst in cypher than java.
> 
> Thank you for reading this wall,
> 
> Costas
> 
> 
>  
> 
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] On relative performance of native querying in Java vs. Cypher querying.

Reply via email to