Re: [Neo4j] ArangoDB vs. Neo4j -- what's up? article of Jun 04, 2015

Frank Celler Mon, 08 Jun 2015 08:46:50 -0700

Hi Michael, hi Jacob, 

thanks a lot for your improvements. I've updated the tests as follows: (1) 
replaced the index by a constraint, (2) reduced the page cache memory to 
2.5GB, (3) updated to Java 8, (4) added a JIT warmup by executing 2500 
shortest paths before the test. This has improved the read results. 
However, the write test got much worse. The culprit seems to (2). With 20GB 
page cache memory the writes are almost twice as fast.


I've try both "node-neo4j" in version 2.0.3 and "neo4j" in version 
2.0.0-RC1 instead of Seraph without much success. It seems that these 
driver are also creating new connection for each cypher query. Any hint how 
to fix this is more than welcome. I could not find a configuration option 
in the documentation, but maybe I missed something. 

Best regards,
  Frank

Am Montag, 8. Juni 2015 07:26:53 UTC+2 schrieb Jacob Hansson:
>
>
>
> On Sunday, June 7, 2015 at 8:13:58 AM UTC-5, Frank Celler wrote:
>>
>> Hi Michael,
>>
>> Seraph uses old and outdated, 2-year old APIs (/db/data/cypher and 
>> /db/data/node) which are not performant 
>> and also misses relevant headers (e.g. X-Stream:true) for those.
>> It also doesn't support http keep-alive. 
>>
>> thanks for clarification. We selected Serap because your web-site 
>> recommended it (http://neo4j.com/developer/javascript/ 
>> <http://www.google.com/url?q=http%3A%2F%2Fneo4j.com%2Fdeveloper%2Fjavascript%2F&sa=D&sntz=1&usg=AFQjCNGdtNNBqRiGbqx4pMn7Q9zxd17ygw>).
>>  
>> It was the first one on that page and the page did not suggest that it was 
>> outdated. That why we used it. I will rewrite the test to use the driver 
>> you suggested.
>>
>> Is this the correct driver:
>>
>> https://www.npmjs.com/package/neo4j
>>
>> in Version 2.0.0-RC1? Are there any configuration options I need to be 
>> aware of to switch on keep-alive?
>>
>> Configuration for Neo4j also easy to improve, for your store 2.5G 
>> page-cache memory should be enough.
>>
>> I will reduce dbms.pagecache.memory to 2.5GB then (thanks to credits from 
>> Google, the machine as a lot of memory, so we thought, the more the better).
>>
>> The warmup is also not sufficient.
>>
>>
>> And running the queries once, i.e. cold caches are also a non-production 
>> approach.
>>
>> There are a some 2*10^12 possible combinations for shortest paths. In a 
>> productive environment, pairs are likely to be new and not caches hits. 
>> Therefore we did not want to test query caches, but the real performance of 
>> the computation.
>>
>
> This sticks out as an odd assumption to me, for two reasons. 
>
> One is that many of the optimizations that happens when you "warm up" 
> Neo4j are independent of the data you're accessing - core methods of the 
> database will not get inlined until they have executed 1500 or so times. 
> Running the database as interpreted Java code is not representative of how 
> it would be exercised in production. Similarly, Neo4j ships with a fairly 
> sophisticated cost-based query planner and, when performing the initial 
> query execution, the planner will estimate the cost of many many different 
> approaches to solving a query. In a production setting, that overhead would 
> not be seen in regular operation.
>
> Two is that production access patterns are very rarely uniform. If this 
> was the case, there'd be no need for most caching we do as computer 
> engineers :) Nice as that would be, caching being on the list of the two 
> hardest problems of comp sci and all, the proliferation of caching from the 
> hardware layer up to memory clouds paint a strong picture. Good caching 
> makes or breaks production systems, and if you are consciously engineering 
> benchmarks to bypass them you are not testing production behavior.
>
>>
>> I'm currently looking into it and will post an blog post with my 
>> recommendations next week.
>>
>> Perfect, will rerun the tests using your suggestions.
>>
>> As we all know benchmark tests are always well suited to the publisher :)
>>
>> The index should be a unique constraint instead.
>>
>> I will change this from an index to a unique constraint.
>>
>> Thanks
>>   Frank
>>
>>
>> Cheers, Michael
>>
>>
>> Am Sonntag, 7. Juni 2015 14:25:34 UTC+2 schrieb Michael Hunger:
>>>
>>> Hi,
>>>
>>> It would have been very nice to be contacted before such an article went 
>>> out and not called out as part of the post to "defend yourself". Just 
>>> saying.
>>>
>>> Seraph uses old and outdated, 2-year old APIs (/db/data/cypher and 
>>> /db/data/node) which are not performant 
>>> and also misses relevant headers (e.g. X-Stream:true) for those.
>>> It also doesn't support http keep-alive. 
>>>
>>> I would either use requests directly or perhaps node-neo4j 2.x, would 
>>> have to test though.
>>>
>>> Configuration for Neo4j also easy to improve, for your store 2.5G 
>>> page-cache memory should be enough.
>>> The warmup is also not sufficient.
>>>
>>> And running the queries once, i.e. cold caches are also a non-production 
>>> approach.
>>>
>>> I'm currently looking into it and will post an blog post with my 
>>> recommendations next week.
>>>
>>> As we all know benchmark tests are always well suited to the publisher :)
>>>
>>> The index should be a unique constraint instead.
>>>
>>> Cheers, Michael
>>>
>>> Am 07.06.2015 um 12:33 schrieb Frank Celler <[email protected]>:
>>>
>>> Hi Christophe,
>>>
>>> I'm Frank from ArangoDB. The author of the article, Claudius, is my 
>>> colleague - he currently not at his computer. Therefore, I try to answer 
>>> your questions. Please let me know, if you need more information. Any help 
>>> with the queries is more than welcome. If we can improve them in any way, 
>>> please let us know.
>>>
>>> - we raised the ulimit as requested by neo4j when it started: open files 
>>> (-n) 40000
>>>
>>> - there is one index on PROFILES:
>>>
>>> neo4j-sh (?)$ schema
>>> Indexes
>>>   ON :PROFILES(_key) ONLINE  
>>>
>>> - as far as we understood, there is no need to create an index for edges
>>>
>>> - we used "seraph" as node.js driver, because that was recommend in the 
>>> node user group
>>>
>>> - we set
>>>
>>> dbms.pagecache.memory=20g
>>>
>>> (we were told in talk, that this is nowadays the only cache parameter 
>>> that matters).
>>>
>>> - we started with 
>>>
>>> ./bin/neo4j start
>>>
>>> - JVM is
>>>
>>> java version "1.7.0_79"
>>> Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
>>> Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
>>>
>>> Thanks for your help
>>>   Frank
>>>
>>> Am Freitag, 5. Juni 2015 19:25:09 UTC+2 schrieb Christophe Willemsen:
>>>>
>>>> I have looked at their repository too. Most of the queries seems 
>>>> 'almost' correct, but there is no information concerning the real schema 
>>>> indexes, the configuration of the JVM etc.., also the results are the 
>>>> throughput so I wait for someone maybe more experimented in these kind of 
>>>> benchmarks in order to reply to it.
>>>>
>>>> Le vendredi 5 juin 2015 04:32:59 UTC+2, Michael Hunger a écrit :
>>>>>
>>>>> I'm currently on the road but there are several things wrong with it. 
>>>>> Will look into more detail in the next few days
>>>>>
>>>>> Michael
>>>>>
>>>>> Von meinem iPhone gesendet
>>>>>
>>>>> Am 04.06.2015 um 12:57 schrieb Andrii Stesin <[email protected]>:
>>>>>
>>>>> Just ran into the following article (published supposedly today Jun 
>>>>> 04, 2015) which claims to contain comparison of benchmark results: Native 
>>>>> multi-model can compete with pure document and graph databases 
>>>>> <https://www.arangodb.com/2015/06/multi-model-benchmark/> which makes 
>>>>> me think that there is something wrong with either their data model or 
>>>>> with 
>>>>> test setup, because results for Neo4j are surprisingly low.
>>>>>
>>>>> Am I the only one out there who feel the same?
>>>>>
>>>>> WBR,
>>>>> Andrii
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] ArangoDB vs. Neo4j -- what's up? article of Jun 04, 2015

Reply via email to