Hi Costas,
this is quite an email and exercise.
In general the Java APIs outperform cypher as they don't include parsing
queries, transforming results etc and are already pre-compiled by the java
compiler to effective bytecode on the jvm.
Cypher runs more like an interpreter on top of it.
Currently we don't fully leverage the underlying SPI in Cypher in all places so
that it runs on a higher level API (kind of on top of the Java API) but this is
bound to change.
There are some issues with your tests:
- you don't try to do the same in Cypher and Java, your Java code shortcuts and
leaves off operations that cypher still does
- eg. to stop execution after the first it you would use LIMIT 1 at the
return, also in cypher strings out of the results, something you don't do in
java
- and you string-concatenate the queries although the could be constants
- your queries are not really graph queries but full scans over all nodes
loading them all and comparing on properties, in reality you almost never do
something like that
- instead you use indexes and unique constraints in conjunction with labels to
exactly load a single item if you do equality lookups
- your graph model is not really realistic, having a single node with many
relationships is not what you would navigate from
So while your general observation is still correct you should make sure that
your methods are
#1 comparable
#2 using realistic operations
E.g. after changing your App2 a bit to accomodate some of what I said, i got:
Java (microseconds): 106944 result 'i': 100 testing equality on:
100
Cypher [prepared with 1 execution engine] (microseconds): 98206 result 'i':
100 testing equality on: 100
You see, it's not necessarily like that.
public static final String QUERY = "START m=node:index('id:indexnode')
MATCH (m)-->(n) WHERE n.i={max} return n.i as i LIMIT 1";
and if you created a unique constraint on :node(id)
and use a query like this: MATCH (m)-->(n:node {i:{max}}) return n.i as i LIMIT
1
you get these numbers:
Cypher [prepared with 1 execution engine] (microseconds): 842 result 'i':
indexnode testing equality on: 100
Java (microseconds): 356 result 'i': indexnode testing equality on:
100
see: https://gist.github.com/jexp/9607779
So using the right graph model and the correct indexes makes much more
difference than just choosing between Java and Cypher.
MacOSX 10.9
Java 1.7_b45
Neo4j 2.0.1
16GB RAM, defaults
Thanks again for all your work.
Cheers,
Michael
----
(michael)-[:SUPPORTS]->(YOU)-[:USE]->(Neo4j)
Learn Online, Offline or Read a Book (in Deutsch)
We're trading T-shirts for cool Graph Models
Am 17.03.2014 um 17:05 schrieb [email protected]:
> Author's note: Even though this post seems to be partially delving into the
> technical characteristics of Cypher it has been created as an initiator for
> discussion on relative scalability and performance of native vs. cypher so I
> believed it should be placed in the group forum as opposed to stack overflow.
> If the coordinators disagree on this decision please move it to SO. Thank you.
>
> * neo4j version, library versions, OS, jdk:
>
> Neo4j Stable Release 2.0.1 [community edition], with relevant libraries.
> OS: windows 7 (64 bit), service pack 1, 8gb ram, i5 quad core desktop
> (without hyper-threading).
> jdk: java 7 (jdk1.7.0_51).
> all applications are in java, using embedded database instance (java runtime
> and neo4j vm/other arguments are default (empty) in each case).
>
> Fellow Neo4J enthusiasts,
> After performing various simple benchmarks it would appear that using native
> java to query a neo graph outperforms doing the same using cypher.
> This would normally be surprising in a relational context but i can see how
> it can make sense in this graph context (as virtual memory is heavily used
> etc.).
> As such I would like to know if this is in fact the case for the majority of
> queries that can be made in neo4j, or whether I am missing something
> glaringly obvious in my tests.
>
> A summary of my test parameters is as follows:
> a graph of 100k nodes (each having one property) is used as a baseline
> (relationships and more properties are added for later experiments).
> after graph insertion the jvm is restarted before querying in order to clear
> virtual memory.
> the same query is performed using cypher and neo4j in each case.
> the jvm is restarted for every subsequent query experiment to ensure virtual
> memory is wiped (vm is embedded into java heap in windows by default).
> during each experiment 10 repetitions of the query are performed, with each
> repetition having a small change in one of the query variables. This is to
> test execution time for the first instance (hence before any nodes are in
> virtual memory) as well as subsequent calls where most of the nodes queried
> are in RAM.
> execution time is recorded using System.nanotime() for accuracy.
> the cypher queries have been optimised as much as possible (with my current
> knowledge) using a single execution engine per experiment and the changing
> variable is injected as a parameter to the query.
> Hypothesis:
>
> For all experiments (details below) both for initial query execution as well
> as subsequent executions, the native java api outperforms cypher.
>
> The following queries were performed (code presented below, this is a summary
> in text for conciseness):
> get me all nodes which have property 'i' equal to {max}. NB: The variable
> {max} ranges from 0 to 100k in the nodes and the queries test all possible
> ranges to ensure that the java execution does not have a preferential
> treatment by finding results close to the top of the node file (as is shown
> in the results).
> get me all nodes which have property 'i' equal to {max}, through a single
> indexed node. In my context (model-driven engineering research) it is common
> to have a single (or very few) starting points for a query, so my second test
> simulated this behaviour by creating a single "source" node with relationship
> to the 100k nodes to be queried. as such, the query goes to the lucene index
> to find the node and then traverses the relationship it has to the 100k nodes
> in order to be executed. This also avoids using the
> GlobalGraphOperations.at(database).getAllNodes() operation in java (which is
> useful as it would never be used in my context).
> get me all nodes which have property 'i' equal to {max} and relationship
> named (of relationship type with name) {name} to another node. this is a
> simple extension to the first query which uses a one-hop traversal as well.
> get me all nodes which have property 'i' equal to {max} and 'i2' equal to
> {max2}.
> get me all nodes which have property 'i' equal to {max} and 'i2' equal to
> {max2} and relationship named {name} to another node.
> Results:
>
> I will only present the detailed results of the first query as it would get
> tediously long to present them all (they are all included as a snippet link
> below), and as mentioned above, all of them seem to support the statement
> that java
> outperforms cypher.
>
> Query 1 results:
>
> Java (microseconds): 15415 result 'i': 1 testing equality on: 1
> Java (microseconds): 288 result 'i': 2 testing equality on: 2
> Java (microseconds): 333 result 'i': 3 testing equality on: 3
> Java (microseconds): 304 result 'i': 4 testing equality on: 4
> Java (microseconds): 303 result 'i': 5 testing equality on: 5
> Java (microseconds): 319 result 'i': 6 testing equality on: 6
> Java (microseconds): 331 result 'i': 7 testing equality on: 7
> Java (microseconds): 355 result 'i': 8 testing equality on: 8
> Java (microseconds): 368 result 'i': 9 testing equality on: 9
> Java (microseconds): 385 result 'i': 10 testing equality on: 10
>
> Java (microseconds): 175027 result 'i': 1000 testing equality on: 1000
> Java (microseconds): 146228 result 'i': 2000 testing equality on: 2000
> Java (microseconds): 126249 result 'i': 3000 testing equality on: 3000
> Java (microseconds): 98282 result 'i': 4000 testing equality on: 4000
> Java (microseconds): 69881 result 'i': 5000 testing equality on: 5000
> Java (microseconds): 38536 result 'i': 6000 testing equality on: 6000
> Java (microseconds): 24090 result 'i': 7000 testing equality on: 7000
> Java (microseconds): 25140 result 'i': 8000 testing equality on: 8000
> Java (microseconds): 25849 result 'i': 9000 testing equality on: 9000
> Java (microseconds): 26664 result 'i': 10000 testing equality on: 10000
>
> Java (microseconds): 1704711 result 'i': 99997 testing equality on: 99997
> Java (microseconds): 119149 result 'i': 99998 testing equality on: 99998
> Java (microseconds): 49827 result 'i': 99999 testing equality on: 99999
> Java (microseconds): 60392 result 'i': -1 testing equality on: 100000
> Java (microseconds): 42451 result 'i': -1 testing equality on: 100001
> Java (microseconds): 35205 result 'i': -1 testing equality on: 100002
> Java (microseconds): 36279 result 'i': -1 testing equality on: 100003
> Java (microseconds): 34999 result 'i': -1 testing equality on: 100004
> Java (microseconds): 35179 result 'i': -1 testing equality on: 100005
> Java (microseconds): 45571 result 'i': -1 testing equality on: 100006
>
> Cypher [prepared with 1 execution engine] (microseconds): 2688552 result
> 'i': 100 testing equality on: 100
> Cypher [prepared with 1 execution engine] (microseconds): 134839 result 'i':
> 200 testing equality on: 200
> Cypher [prepared with 1 execution engine] (microseconds): 116128 result 'i':
> 300 testing equality on: 300
> Cypher [prepared with 1 execution engine] (microseconds): 96070 result 'i':
> 400 testing equality on: 400
> Cypher [prepared with 1 execution engine] (microseconds): 111627 result 'i':
> 500 testing equality on: 500
> Cypher [prepared with 1 execution engine] (microseconds): 116955 result 'i':
> 600 testing equality on: 600
> Cypher [prepared with 1 execution engine] (microseconds): 98720 result 'i':
> 700 testing equality on: 700
> Cypher [prepared with 1 execution engine] (microseconds): 96051 result 'i':
> 800 testing equality on: 800
> Cypher [prepared with 1 execution engine] (microseconds): 106406 result 'i':
> 900 testing equality on: 900
> Cypher [prepared with 1 execution engine] (microseconds): 97068 result 'i':
> 1000 testing equality on: 1000
>
> Cypher [prepared with 1 execution engine] (microseconds): 2651371 result
> 'i': 1000 testing equality on: 1000
> Cypher [prepared with 1 execution engine] (microseconds): 121623 result 'i':
> 2000 testing equality on: 2000
> Cypher [prepared with 1 execution engine] (microseconds): 95211 result 'i':
> 3000 testing equality on: 3000
> Cypher [prepared with 1 execution engine] (microseconds): 79345 result 'i':
> 4000 testing equality on: 4000
> Cypher [prepared with 1 execution engine] (microseconds): 88915 result 'i':
> 5000 testing equality on: 5000
> Cypher [prepared with 1 execution engine] (microseconds): 100527 result 'i':
> 6000 testing equality on: 6000
> Cypher [prepared with 1 execution engine] (microseconds): 77890 result 'i':
> 7000 testing equality on: 7000
> Cypher [prepared with 1 execution engine] (microseconds): 77430 result 'i':
> 8000 testing equality on: 8000
> Cypher [prepared with 1 execution engine] (microseconds): 76451 result 'i':
> 9000 testing equality on: 9000
> Cypher [prepared with 1 execution engine] (microseconds): 86732 result 'i':
> 10000 testing equality on: 10000
>
> As we can clearly see, Java can "cheat" on low equality tests (as we break
> after finding the node as we assume (and know in our context) 'i' is unique)
> but more interestingly even when it fails (checking i > 100k) it is still
> roughly 2
> times as fast as cypher both for initial queries and subsequent ones (this is
> a good test for non-unique properties too as it forces java to iterate
> through all of the nodes present).
>
> This shows two things in my view:
> 1) Java can optimise for unique results. As far as I am aware cypher cannot
> be told to stop when it finds a result we know is unique (such as an ISBN of
> a book for example or any other unique property in a node).
> 2) For non-unique results (or for a failed query) it is still faster than
> cypher.
>
> After getting these results my curiosity prompted me to expand the scope by
> adding relationships and a second attribute to see if the same trend
> continues, and it did.
>
> Links to code snippets:
>
> first query:
> https://gist.github.com/anonymous/9601553
>
> second query:
> https://gist.github.com/anonymous/9601556
>
> third query:
> https://gist.github.com/anonymous/9601571
>
> fourth:
> https://gist.github.com/anonymous/9601581
>
> fifth:
> https://gist.github.com/anonymous/9601590
>
> entire result set link:
> https://gist.github.com/anonymous/9601486
>
> Discussion:
>
> My main question is whether these results are to be expected (or even obvious
> to some) which would mean I will just use native java in my application.
> If my results are wrong/misleading I would appreciate knowing why but if not,
> a discussion on how to improve cypher to attempt to close the gap may be
> useful.
>
> Other notes and observations I had (non-conclusive as the tests performed
> were not as thorough as the above):
> using 'count' in cypher seems to destroy execution time (whereas normally in
> sql it improves it).
> adding depth to a cypher search (for example going from (a)-[]->(b) to
> (a)-[]->(b)-[]->(c)) seems to scale a lot worst in cypher than java.
>
> Thank you for reading this wall,
>
> Costas
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.