Re: [Neo4j] On relative performance of native querying in Java vs. Cypher querying.

Michael Hunger Mon, 17 Mar 2014 13:33:59 -0700

Hi Costas,

this is quite an email and exercise.


In general the Java APIs outperform cypher as they don't include parsing 
queries, transforming results etc and are already pre-compiled by the java 
compiler to effective bytecode on the jvm.
Cypher runs more like an interpreter on top of it.

Currently we don't fully leverage the underlying SPI in Cypher in all places so 
that it runs on a higher level API (kind of on top of the Java API) but this is 
bound to change.

There are some issues with your tests:

- you don't try to do the same in Cypher and Java, your Java code shortcuts and 
leaves off operations that cypher still does
  - eg. to stop execution after the first it you would use LIMIT 1 at the 
return, also in cypher strings out of the results, something you don't do in 
java
  - and you string-concatenate the queries although the could be constants 
- your queries are not really graph queries but full scans over all nodes 
loading them all and comparing on properties, in reality you almost never do 
something like that
- instead you use indexes and unique constraints in conjunction with labels to 
exactly load a single item if you do equality lookups
- your graph model is not really realistic, having a single node with many 
relationships is not what you would navigate from

So while your general observation is still correct you should make sure that 
your methods are 

#1 comparable
#2 using realistic operations


E.g. after changing your App2 a bit to accomodate some of what I said, i got:

Java (microseconds): 106944      result 'i': 100         testing equality on: 
100
Cypher [prepared with 1 execution engine] (microseconds): 98206  result 'i': 
100         testing equality on: 100

You see, it's not necessarily like that.

    public static final String QUERY = "START m=node:index('id:indexnode') 
MATCH (m)-->(n) WHERE n.i={max} return n.i as i LIMIT 1";

and if you created a unique constraint on :node(id)     
and use a query like this: MATCH (m)-->(n:node {i:{max}}) return n.i as i LIMIT 
1

you get these numbers:
Cypher [prepared with 1 execution engine] (microseconds): 842    result 'i': 
indexnode   testing equality on: 100
Java (microseconds): 356         result 'i': indexnode   testing equality on: 
100

see: https://gist.github.com/jexp/9607779

So using the right graph model and the correct indexes makes much more 
difference than just choosing between Java and Cypher.

MacOSX 10.9
Java 1.7_b45
Neo4j 2.0.1
16GB RAM, defaults

Thanks again for all your work.

Cheers,

Michael

----
(michael)-[:SUPPORTS]->(YOU)-[:USE]->(Neo4j)
Learn Online, Offline or Read a Book (in Deutsch)
We're trading T-shirts for cool Graph Models







Am 17.03.2014 um 17:05 schrieb [email protected]:

> Author's note: Even though this post seems to be partially delving into the 
> technical characteristics of Cypher it has been created as an initiator for 
> discussion on relative scalability and performance of native vs. cypher so I 
> believed it should be placed in the group forum as opposed to stack overflow. 
> If the coordinators disagree on this decision please move it to SO. Thank you.
> 
> * neo4j version, library versions, OS, jdk:
> 
> Neo4j Stable Release 2.0.1 [community edition], with relevant libraries.
> OS: windows 7 (64 bit), service pack 1, 8gb ram, i5 quad core desktop 
> (without hyper-threading).
> jdk: java 7 (jdk1.7.0_51).
> all applications are in java, using embedded database instance (java runtime 
> and neo4j vm/other arguments are default (empty) in each case).
> 
> Fellow Neo4J enthusiasts,
> After performing various simple benchmarks it would appear that using native 
> java to query a neo graph outperforms doing the same using cypher.
> This would normally be surprising in a relational context but i can see how 
> it can make sense in this graph context (as virtual memory is heavily used 
> etc.).
> As such I would like to know if this is in fact the case for the majority of 
> queries that can be made in neo4j, or whether I am missing something 
> glaringly obvious in my tests.
>  
> A summary of my test parameters is as follows:
> a graph of 100k nodes (each having one property) is used as a baseline 
> (relationships and more properties are added for later experiments).
> after graph insertion the jvm is restarted before querying in order to clear 
> virtual memory.
> the same query is performed using cypher and neo4j in each case.
> the jvm is restarted for every subsequent query experiment to ensure virtual 
> memory is wiped (vm is embedded into java heap in windows by default).
> during each experiment 10 repetitions of the query are performed, with each 
> repetition having a small change in one of the query variables. This is to 
> test execution time for the first instance (hence before any nodes are in 
> virtual memory) as well as subsequent calls where most of the nodes queried 
> are in RAM.
> execution time is recorded using System.nanotime() for accuracy.
> the cypher queries have been optimised as much as possible (with my current 
> knowledge) using a single execution engine per experiment and the changing 
> variable is injected as a parameter to the query.
> Hypothesis:
> 
> For all experiments (details below) both for initial query execution as well 
> as subsequent executions, the native java api outperforms cypher.
> 
> The following queries were performed (code presented below, this is a summary 
> in text for conciseness):
> get me all nodes which have property 'i' equal to {max}. NB: The variable 
> {max} ranges from 0 to 100k in the nodes and the queries test all possible 
> ranges to ensure that the java execution does not have a preferential 
> treatment by finding results close to the top of the node file (as is shown 
> in the results).
> get me all nodes which have property 'i' equal to {max}, through a single 
> indexed node. In my context (model-driven engineering research) it is common 
> to have a single (or very few) starting points for a query, so my second test 
> simulated this behaviour by creating a single "source" node with relationship 
> to the 100k nodes to be queried. as such, the query goes to the lucene index 
> to find the node and then traverses the relationship it has to the 100k nodes 
> in order to be executed. This also avoids using the 
> GlobalGraphOperations.at(database).getAllNodes() operation in java (which is 
> useful as it would never be used in my context).
> get me all nodes which have property 'i' equal to {max} and relationship 
> named (of relationship type with name) {name} to another node. this is a 
> simple extension to the first query which uses a one-hop traversal as well.
> get me all nodes which have property 'i' equal to {max} and 'i2' equal to 
> {max2}.
> get me all nodes which have property 'i' equal to {max} and 'i2' equal to 
> {max2} and relationship named {name} to another node. 
> Results:
> 
> I will only present the detailed results of the first query as it would get 
> tediously long to present them all (they are all included as a snippet link 
> below), and as mentioned above, all of them seem to support the statement 
> that java
> outperforms cypher.
> 
> Query 1 results:
> 
> Java (microseconds): 15415  result 'i': 1  testing equality on: 1
> Java (microseconds): 288  result 'i': 2  testing equality on: 2
> Java (microseconds): 333  result 'i': 3  testing equality on: 3
> Java (microseconds): 304  result 'i': 4  testing equality on: 4
> Java (microseconds): 303  result 'i': 5  testing equality on: 5
> Java (microseconds): 319  result 'i': 6  testing equality on: 6
> Java (microseconds): 331  result 'i': 7  testing equality on: 7
> Java (microseconds): 355  result 'i': 8  testing equality on: 8
> Java (microseconds): 368  result 'i': 9  testing equality on: 9
> Java (microseconds): 385  result 'i': 10  testing equality on: 10
> 
> Java (microseconds): 175027  result 'i': 1000  testing equality on: 1000
> Java (microseconds): 146228  result 'i': 2000  testing equality on: 2000
> Java (microseconds): 126249  result 'i': 3000  testing equality on: 3000
> Java (microseconds): 98282  result 'i': 4000  testing equality on: 4000
> Java (microseconds): 69881  result 'i': 5000  testing equality on: 5000
> Java (microseconds): 38536  result 'i': 6000  testing equality on: 6000
> Java (microseconds): 24090  result 'i': 7000  testing equality on: 7000
> Java (microseconds): 25140  result 'i': 8000  testing equality on: 8000
> Java (microseconds): 25849  result 'i': 9000  testing equality on: 9000
> Java (microseconds): 26664  result 'i': 10000  testing equality on: 10000
> 
> Java (microseconds): 1704711  result 'i': 99997  testing equality on: 99997
> Java (microseconds): 119149  result 'i': 99998  testing equality on: 99998
> Java (microseconds): 49827  result 'i': 99999  testing equality on: 99999
> Java (microseconds): 60392  result 'i': -1   testing equality on: 100000
> Java (microseconds): 42451  result 'i': -1   testing equality on: 100001
> Java (microseconds): 35205  result 'i': -1   testing equality on: 100002
> Java (microseconds): 36279  result 'i': -1   testing equality on: 100003
> Java (microseconds): 34999  result 'i': -1   testing equality on: 100004
> Java (microseconds): 35179  result 'i': -1   testing equality on: 100005
> Java (microseconds): 45571  result 'i': -1   testing equality on: 100006
> 
> Cypher [prepared with 1 execution engine] (microseconds): 2688552  result 
> 'i': 100  testing equality on: 100
> Cypher [prepared with 1 execution engine] (microseconds): 134839  result 'i': 
> 200  testing equality on: 200
> Cypher [prepared with 1 execution engine] (microseconds): 116128  result 'i': 
> 300  testing equality on: 300
> Cypher [prepared with 1 execution engine] (microseconds): 96070   result 'i': 
> 400  testing equality on: 400
> Cypher [prepared with 1 execution engine] (microseconds): 111627  result 'i': 
> 500  testing equality on: 500
> Cypher [prepared with 1 execution engine] (microseconds): 116955  result 'i': 
> 600  testing equality on: 600
> Cypher [prepared with 1 execution engine] (microseconds): 98720   result 'i': 
> 700  testing equality on: 700
> Cypher [prepared with 1 execution engine] (microseconds): 96051   result 'i': 
> 800  testing equality on: 800
> Cypher [prepared with 1 execution engine] (microseconds): 106406  result 'i': 
> 900  testing equality on: 900
> Cypher [prepared with 1 execution engine] (microseconds): 97068   result 'i': 
> 1000  testing equality on: 1000
> 
> Cypher [prepared with 1 execution engine] (microseconds): 2651371  result 
> 'i': 1000  testing equality on: 1000
> Cypher [prepared with 1 execution engine] (microseconds): 121623  result 'i': 
> 2000  testing equality on: 2000
> Cypher [prepared with 1 execution engine] (microseconds): 95211   result 'i': 
> 3000  testing equality on: 3000
> Cypher [prepared with 1 execution engine] (microseconds): 79345   result 'i': 
> 4000  testing equality on: 4000
> Cypher [prepared with 1 execution engine] (microseconds): 88915   result 'i': 
> 5000  testing equality on: 5000
> Cypher [prepared with 1 execution engine] (microseconds): 100527  result 'i': 
> 6000  testing equality on: 6000
> Cypher [prepared with 1 execution engine] (microseconds): 77890   result 'i': 
> 7000  testing equality on: 7000
> Cypher [prepared with 1 execution engine] (microseconds): 77430   result 'i': 
> 8000  testing equality on: 8000
> Cypher [prepared with 1 execution engine] (microseconds): 76451   result 'i': 
> 9000  testing equality on: 9000
> Cypher [prepared with 1 execution engine] (microseconds): 86732   result 'i': 
> 10000  testing equality on: 10000
> 
> As we can clearly see, Java can "cheat" on low equality tests (as we break 
> after finding the node as we assume (and know in our context) 'i' is unique) 
> but more interestingly even when it fails (checking i > 100k) it is still 
> roughly 2
> times as fast as cypher both for initial queries and subsequent ones (this is 
> a good test for non-unique properties too as it forces java to iterate 
> through all of the nodes present).
> 
> This shows two things in my view:
> 1) Java can optimise for unique results. As far as I am aware cypher cannot 
> be told to stop when it finds a result we know is unique (such as an ISBN of 
> a book for example or any other unique property in a node).
> 2) For non-unique results (or for a failed query) it is still faster than 
> cypher.
> 
> After getting these results my curiosity prompted me to expand the scope by 
> adding relationships and a second attribute to see if the same trend 
> continues, and it did.
> 
> Links to code snippets:
> 
> first query:
> https://gist.github.com/anonymous/9601553
> 
> second query:
> https://gist.github.com/anonymous/9601556
> 
> third query:
> https://gist.github.com/anonymous/9601571
> 
> fourth:
> https://gist.github.com/anonymous/9601581
> 
> fifth:
> https://gist.github.com/anonymous/9601590
> 
> entire result set link:
> https://gist.github.com/anonymous/9601486
> 
> Discussion:
> 
> My main question is whether these results are to be expected (or even obvious 
> to some) which would mean I will just use native java in my application.
> If my results are wrong/misleading I would appreciate knowing why but if not, 
> a discussion on how to improve cypher to attempt to close the gap may be 
> useful.
> 
> Other notes and observations I had (non-conclusive as the tests performed 
> were not as thorough as the above):
> using 'count' in cypher seems to destroy execution time (whereas normally in 
> sql it improves it).
> adding depth to a cypher search (for example going from (a)-[]->(b) to 
> (a)-[]->(b)-[]->(c)) seems to scale a lot worst in cypher than java.
> 
> Thank you for reading this wall,
> 
> Costas
> 
> 
>  
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] On relative performance of native querying in Java vs. Cypher querying.

Reply via email to