Happy I could help. Please consider blogging about it after you worked out your solution. And don't hesitate to come back with other questions.
Cheers, Michael Am 20.03.2014 um 11:36 schrieb [email protected]: > Dear Michael, > > Thank you for your detailed response and the multiple clarifications. > > I was considering some of these concerns (but did not want to add an even > longer amount of tests) but there were some (like the limit expression) I was > not aware of at all. > > It has been very helpful in furthering my understanding of Neo. > > Cheers, > > > On Monday, March 17, 2014 4:05:41 PM UTC, [email protected] wrote: > Author's note: Even though this post seems to be partially delving into the > technical characteristics of Cypher it has been created as an initiator for > discussion on relative scalability and performance of native vs. cypher so I > believed it should be placed in the group forum as opposed to stack overflow. > If the coordinators disagree on this decision please move it to SO. Thank you. > > * neo4j version, library versions, OS, jdk: > > Neo4j Stable Release 2.0.1 [community edition], with relevant libraries. > OS: windows 7 (64 bit), service pack 1, 8gb ram, i5 quad core desktop > (without hyper-threading). > jdk: java 7 (jdk1.7.0_51). > all applications are in java, using embedded database instance (java runtime > and neo4j vm/other arguments are default (empty) in each case). > > Fellow Neo4J enthusiasts, > After performing various simple benchmarks it would appear that using native > java to query a neo graph outperforms doing the same using cypher. > This would normally be surprising in a relational context but i can see how > it can make sense in this graph context (as virtual memory is heavily used > etc.). > As such I would like to know if this is in fact the case for the majority of > queries that can be made in neo4j, or whether I am missing something > glaringly obvious in my tests. > > A summary of my test parameters is as follows: > a graph of 100k nodes (each having one property) is used as a baseline > (relationships and more properties are added for later experiments). > after graph insertion the jvm is restarted before querying in order to clear > virtual memory. > the same query is performed using cypher and neo4j in each case. > the jvm is restarted for every subsequent query experiment to ensure virtual > memory is wiped (vm is embedded into java heap in windows by default). > during each experiment 10 repetitions of the query are performed, with each > repetition having a small change in one of the query variables. This is to > test execution time for the first instance (hence before any nodes are in > virtual memory) as well as subsequent calls where most of the nodes queried > are in RAM. > execution time is recorded using System.nanotime() for accuracy. > the cypher queries have been optimised as much as possible (with my current > knowledge) using a single execution engine per experiment and the changing > variable is injected as a parameter to the query. > Hypothesis: > > For all experiments (details below) both for initial query execution as well > as subsequent executions, the native java api outperforms cypher. > > The following queries were performed (code presented below, this is a summary > in text for conciseness): > get me all nodes which have property 'i' equal to {max}. NB: The variable > {max} ranges from 0 to 100k in the nodes and the queries test all possible > ranges to ensure that the java execution does not have a preferential > treatment by finding results close to the top of the node file (as is shown > in the results). > get me all nodes which have property 'i' equal to {max}, through a single > indexed node. In my context (model-driven engineering research) it is common > to have a single (or very few) starting points for a query, so my second test > simulated this behaviour by creating a single "source" node with relationship > to the 100k nodes to be queried. as such, the query goes to the lucene index > to find the node and then traverses the relationship it has to the 100k nodes > in order to be executed. This also avoids using the > GlobalGraphOperations.at(database).getAllNodes() operation in java (which is > useful as it would never be used in my context). > get me all nodes which have property 'i' equal to {max} and relationship > named (of relationship type with name) {name} to another node. this is a > simple extension to the first query which uses a one-hop traversal as well. > get me all nodes which have property 'i' equal to {max} and 'i2' equal to > {max2}. > get me all nodes which have property 'i' equal to {max} and 'i2' equal to > {max2} and relationship named {name} to another node. > Results: > > I will only present the detailed results of the first query as it would get > tediously long to present them all (they are all included as a snippet link > below), and as mentioned above, all of them seem to support the statement > that java > outperforms cypher. > > Query 1 results: > > Java (microseconds): 15415 result 'i': 1 testing equality on: 1 > Java (microseconds): 288 result 'i': 2 testing equality on: 2 > Java (microseconds): 333 result 'i': 3 testing equality on: 3 > Java (microseconds): 304 result 'i': 4 testing equality on: 4 > Java (microseconds): 303 result 'i': 5 testing equality on: 5 > Java (microseconds): 319 result 'i': 6 testing equality on: 6 > Java (microseconds): 331 result 'i': 7 testing equality on: 7 > Java (microseconds): 355 result 'i': 8 testing equality on: 8 > Java (microseconds): 368 result 'i': 9 testing equality on: 9 > Java (microseconds): 385 result 'i': 10 testing equality on: 10 > > Java (microseconds): 175027 result 'i': 1000 testing equality on: 1000 > Java (microseconds): 146228 result 'i': 2000 testing equality on: 2000 > Java (microseconds): 126249 result 'i': 3000 testing equality on: 3000 > Java (microseconds): 98282 result 'i': 4000 testing equality on: 4000 > Java (microseconds): 69881 result 'i': 5000 testing equality on: 5000 > Java (microseconds): 38536 result 'i': 6000 testing equality on: 6000 > Java (microseconds): 24090 result 'i': 7000 testing equality on: 7000 > Java (microseconds): 25140 result 'i': 8000 testing equality on: 8000 > Java (microseconds): 25849 result 'i': 9000 testing equality on: 9000 > Java (microseconds): 26664 result 'i': 10000 testing equality on: 10000 > > Java (microseconds): 1704711 result 'i': 99997 testing equality on: 99997 > Java (microseconds): 119149 result 'i': 99998 testing equality on: 99998 > Java (microseconds): 49827 result 'i': 99999 testing equality on: 99999 > Java (microseconds): 60392 result 'i': -1 testing equality on: 100000 > Java (microseconds): 42451 result 'i': -1 testing equality on: 100001 > Java (microseconds): 35205 result 'i': -1 testing equality on: 100002 > Java (microseconds): 36279 result 'i': -1 testing equality on: 100003 > Java (microseconds): 34999 result 'i': -1 testing equality on: 100004 > Java (microseconds): 35179 result 'i': -1 testing equality on: 100005 > Java (microseconds): 45571 result 'i': -1 testing equality on: 100006 > > Cypher [prepared with 1 execution engine] (microseconds): 2688552 result > 'i': 100 testing equality on: 100 > Cypher [prepared with 1 execution engine] (microseconds): 134839 result 'i': > 200 testing equality on: 200 > Cypher [prepared with 1 execution engine] (microseconds): 116128 result 'i': > 300 testing equality on: 300 > Cypher [prepared with 1 execution engine] (microseconds): 96070 result 'i': > 400 testing equality on: 400 > Cypher [prepared with 1 execution engine] (microseconds): 111627 result 'i': > 500 testing equality on: 500 > Cypher [prepared with 1 execution engine] (microseconds): 116955 result 'i': > 600 testing equality on: 600 > Cypher [prepared with 1 execution engine] (microseconds): 98720 result 'i': > 700 testing equality on: 700 > Cypher [prepared with 1 execution engine] (microseconds): 96051 result 'i': > 800 testing equality on: 800 > Cypher [prepared with 1 execution engine] (microseconds): 106406 result 'i': > 900 testing equality on: 900 > Cypher [prepared with 1 execution engine] (microseconds): 97068 result 'i': > 1000 testing equality on: 1000 > > Cypher [prepared with 1 execution engine] (microseconds): 2651371 result > 'i': 1000 testing equality on: 1000 > Cypher [prepared with 1 execution engine] (microseconds): 121623 result 'i': > 2000 testing equality on: 2000 > Cypher [prepared with 1 execution engine] (microseconds): 95211 result 'i': > 3000 testing equality on: 3000 > Cypher [prepared with 1 execution engine] (microseconds): 79345 result 'i': > 4000 testing equality on: 4000 > Cypher [prepared with 1 execution engine] (microseconds): 88915 result 'i': > 5000 testing equality on: 5000 > Cypher [prepared with 1 execution engine] (microseconds): 100527 result 'i': > 6000 testing equality on: 6000 > Cypher [prepared with 1 execution engine] (microseconds): 77890 result 'i': > 7000 testing equality on: 7000 > Cypher [prepared with 1 execution engine] (microseconds): 77430 result 'i': > 8000 testing equality on: 8000 > Cypher [prepared with 1 execution engine] (microseconds): 76451 result 'i': > 9000 testing equality on: 9000 > Cypher [prepared with 1 execution engine] (microseconds): 86732 result 'i': > 10000 testing equality on: 10000 > > As we can clearly see, Java can "cheat" on low equality tests (as we break > after finding the node as we assume (and know in our context) 'i' is unique) > but more interestingly even when it fails (checking i > 100k) it is still > roughly 2 > times as fast as cypher both for initial queries and subsequent ones (this is > a good test for non-unique properties too as it forces java to iterate > through all of the nodes present). > > This shows two things in my view: > 1) Java can optimise for unique results. As far as I am aware cypher cannot > be told to stop when it finds a result we know is unique (such as an ISBN of > a book for example or any other unique property in a node). > 2) For non-unique results (or for a failed query) it is still faster than > cypher. > > After getting these results my curiosity prompted me to expand the scope by > adding relationships and a second attribute to see if the same trend > continues, and it did. > > Links to code snippets: > > first query: > https://gist.github.com/anonymous/9601553 > > second query: > https://gist.github.com/anonymous/9601556 > > third query: > https://gist.github.com/anonymous/9601571 > > fourth: > https://gist.github.com/anonymous/9601581 > > fifth: > https://gist.github.com/anonymous/9601590 > > entire result set link: > https://gist.github.com/anonymous/9601486 > > Discussion: > > My main question is whether these results are to be expected (or even obvious > to some) which would mean I will just use native java in my application. > If my results are wrong/misleading I would appreciate knowing why but if not, > a discussion on how to improve cypher to attempt to close the gap may be > useful. > > Other notes and observations I had (non-conclusive as the tests performed > were not as thorough as the above): > using 'count' in cypher seems to destroy execution time (whereas normally in > sql it improves it). > adding depth to a cypher search (for example going from (a)-[]->(b) to > (a)-[]->(b)-[]->(c)) seems to scale a lot worst in cypher than java. > > Thank you for reading this wall, > > Costas > > > > > > > > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
