Re: [Neo4j] ArangoDB vs. Neo4j -- what's up? article of Jun 04, 2015

Michael Hunger Wed, 10 Jun 2015 08:27:47 -0700

I also did some experiments but didn't have the time to finish yet, here are my observations so far:

Arangodb Measurement

- index -> constraint `CREATE CONSTRAINT ON (p:PROFILES) ASSERT p._key IS UNIQUE;`

- seraph -> replace with node-neo4j 2.0.RC1

- uses 2 year old /cypher api, doesn't send X-Stream:true header

- does not do efficient auth (encode creds on every call)

- doesn't do pooling

- suboptimal queries

- make sure the concurrency level is adequate for the setup (utilize all cores but don't flood, use e.g. async.eachWithLimit)

- warmup with nodes and rels `MATCH ()--() return count(*);`

- enterprise with better vertical read/write scalability vs. community

- Use 12G-24G heap, 2G new gen (-Xmn2G)

- pagecache to 2.5G + growth (e.g. another 2.5G)

- in 2.2 set cache_type = soft or cache_type=none depending on available heap

- fix property encoding, e.g. AGE as int not string, don't store "null" !!

-> affects esp. aggregate query

- don't re-run the benchmark on the same store, start at the initial one

-> creating and deleting the additional PROFILES_TEMP nodes affects repeatability of results

correct datatypes:

* "null" should *never be stored*

* int: public, gender, completion_percentage, AGE,

* long/time: last_login, registration

* optionally as label: gender, public

-> test repository (WIP): with changes in description.js and benchmark.js

https://github.com/jexp/nosql-tests/tree/node-neo4j

queries for for neo4j-shell:

export from="P/P1"

export to="P/P277"

export key="P/P1"

// warmup

MATCH ()--() return count(*);

// 61.245.128 rows

MATCH (s:PROFILES) return count(*);

// 1.632.803 profiles

// 1.15 s

profile

MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) RETURN DISTINCT n._key;

// 295 rows 5 ms

// 1st degree neighbours

MATCH (:PROFILES {_key:{key}})-->(n) RETURN n._key;

// 14 rows 1ms

// 2nd degree neighbours

MATCH (s:PROFILES {_key:{key}})-->(x)

MATCH (x)-->(n:PROFILES)

RETURN DISTINCT n._key;

// 283 rows 6 ms

// shortest path

MATCH (s:PROFILES {_key:{from}}),(t:PROFILES {_key:{to}}),

p = shortestPath((s)-[*..15]->(t)) RETURN [x in nodes(p) | x._key] as path;

// 1 ms, don't return the full data only keys like in the other db's

// aggregation

MATCH (f:PROFILES) RETURN f.AGE, count(*);

// 22s -> should be rather 1.5s

// single read

MATCH (f:PROFILES) WHERE f._key = {key} RETURN f;

// or

MATCH (s:PROFILES {_key:{key}}) RETURN s;

// 1 row with 59 properties 1 ms

// single writes

CREATE (s:PROFILES_TEMP {data}) RETURN id(s);

// delete all nodes with a certain label

// loop until returns 0

MATCH (n:PROFILES_TEMP) WITH n LIMIT 5000 OPTIONAL MATCH (n)-[r]-() DELETE n,r RETURN count(*) as deleted

----

MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) WITH DISTINCT n._key as key RETURN count(*);

// 295 count 5-6ms

MATCH (f:PROFILES) return id(f) % 140, count(*);

// 140 rows -> 1502 ms that's how it should be

sample data:

_key:"P/P1",

public:"1",

completion_percentage:"14",

gender:"1",

region:"zilinsky kraj, zilina",

last_login:"2012-05-25 11:20:00.0",

registration:"2005-04-03 00:00:00.0",

AGE:26,

body:"185 cm, 90 kg",

I_am_working_in_field:"it",

spoken_languages:"anglicky",

hobbies:"sportovanie, spanie, kino, jedlo, pocuvanie hudby, priatelia, divadlo",

I_most_enjoy_good_food:"v dobrej restauracii",

pets:"mam psa",

body_type:"null",

my_eyesight:"null",

eye_color:"null",

hair_color:"null",

hair_type:"null",

completed_level_of_education:"null",

favourite_color:"null",

relation_to_smoking:"null",

relation_to_alcohol:"null",

sign_in_zodiac:"null",

on_pokec_i_am_looking_for:"null",

love_is_for_me:"null",

relation_to_casual_sex:"null",

my_partner_should_be:"null",

marital_status:"null",

children:"null",

relation_to_children:"null",

I_like_movies:"null",

I_like_watching_movie:"null",

I_like_music:"null",

I_mostly_like_listening_to_music:"null",

the_idea_of_good_evening:"null",

I_like_specialties_from_kitchen:"null",

fun:"null",

I_am_going_to_concerts:"null",

my_active_sports:"null",

my_passive_sports:"null",

profession:"null",

I_like_books:"null",

life_style:"null",

music:"null",

cars:"null",

politics:"null",

relationships:"null",

art_culture:"null",

hobbies_interests:"null",

science_technologies:"null",

computers_internet:"null",

education:"null",

sport:"null",

movies:"null",

travelling:"null",

health:"null",

companies_brands:"null",

more:"null"

neo4j-server.properties:

org.neo4j.server.database.location=/Users/mh/support/arangodb/db/data

org.neo4j.server.webserver.port=8474

dbms.security.auth_enabled=false

neo4j-wrapper.conf:

wrapper.java.initmemory=8000

wrapper.java.maxmemory=8000

wrapper.java.additional=-Xmn2G

neo4j.properties:

dbms.pagecache.memory=5G

keep_logical_logs=false

remote_shell_enabled=false

cache_type=soft

_online_backup_enabled_=false

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Comparison Neo4j Social.pdf
Description: Adobe PDF document

Neo4j Enterprise 2.2.2

node benchmark.js neo4j

INFO using server address 127.0.0.1

INFO start

INFO warmup done, relationships 1632803

INFO executing neighbors for 500 elements

INFO total number of neighbors found: 9102

INFO -----------------------------------------------------------------------------

INFO Neo4J: neighbors, 500 items

INFO Total Time for 500 requests: 451 ms

INFO Average: 0.9 ms

INFO -----------------------------------------------------------------------------

INFO executing neighbors 2nd degree for 500 elements

INFO total number of neighbors2 found: 545170

INFO -----------------------------------------------------------------------------

INFO Neo4J: neighbors2, 500 items

INFO Total Time for 500 requests: 5926 ms

INFO Average: 11.8 ms

INFO -----------------------------------------------------------------------------

INFO executing shortest path for 19 paths

INFO total paths length: 104

INFO -----------------------------------------------------------------------------

INFO Neo4J: shortest path, 19 items

INFO Total Time for 19 requests: 80 ms

INFO Average: 4.2 ms

INFO -----------------------------------------------------------------------------

INFO executing single read with 100000 documents

INFO -----------------------------------------------------------------------------

INFO Neo4J: single reads, 100000 items

INFO Total Time for 100000 requests: 95963 ms

INFO Average: 0.96 ms

INFO -----------------------------------------------------------------------------

INFO executing aggregation

INFO -----------------------------------------------------------------------------

INFO Neo4J: aggregate, 1 items

INFO Total Time for 1 requests: 22873 ms

INFO Average: 22873 ms

INFO -----------------------------------------------------------------------------

INFO executing single write with 100000 documents

INFO -----------------------------------------------------------------------------

INFO Neo4J: single writes, 100000 items

INFO Total Time for 100000 requests: 176335 ms

INFO Average: 1.76 ms

INFO -----------------------------------------------------------------------------

Neo4j Enterprise 2.3-SNAPSHOT

wuqour:nosql-tests mh$ node benchmark.js neo4j

INFO using server address 127.0.0.1

INFO start

INFO warmup done, relationships 1632803

INFO executing neighbors for 500 elements

INFO total number of neighbors found: 9102

INFO -----------------------------------------------------------------------------

INFO Neo4J: neighbors, 500 items

INFO Total Time for 500 requests: 479 ms

INFO Average: 0.96 ms

INFO -----------------------------------------------------------------------------

INFO executing neighbors 2nd degree for 500 elements

INFO total number of neighbors2 found: 545170

INFO -----------------------------------------------------------------------------

INFO Neo4J: neighbors2, 500 items

INFO Total Time for 500 requests: 2935 ms

INFO Average: 5.9 ms

INFO -----------------------------------------------------------------------------

INFO executing shortest path for 19 paths

INFO total paths length: 104

INFO -----------------------------------------------------------------------------

INFO Neo4J: shortest path, 19 items

INFO Total Time for 19 requests: 69 ms

INFO Average: 3.63 ms

INFO -----------------------------------------------------------------------------

INFO executing single read with 100000 documents

INFO -----------------------------------------------------------------------------

INFO Neo4J: single reads, 100000 items

INFO Total Time for 100000 requests: 76377 ms

INFO Average: 0.76 ms

INFO -----------------------------------------------------------------------------

INFO executing aggregation

INFO -----------------------------------------------------------------------------

INFO Neo4J: aggregate, 1 items

INFO Total Time for 1 requests: 3413 ms

INFO Average: 3413 ms

INFO -----------------------------------------------------------------------------

INFO executing single write with 100000 documents

INFO -----------------------------------------------------------------------------

INFO Neo4J: single writes, 100000 items

INFO Total Time for 100000 requests: 155658 ms

INFO Average: 1.56 ms

INFO -----------------------------------------------------------------------------

DONE

Am 10.06.2015 um 16:08 schrieb Aseem Kishore <[email protected]>:

No problem at all. Glad it helped!

On Wednesday, June 10, 2015 at 9:04:09 AM UTC-4, Frank Celler wrote:
Hi Aseem,

I have changed the tests and can confirm, that your node.js driver works as expected. I'm now able to restrict the number of connections and use keep-alive. That has indeed helped with the performance. I've updated the blog post accordingly.

Thanks a lot for all your help
Frank

Am Dienstag, 9. Juni 2015 04:34:22 UTC+2 schrieb Aseem Kishore:
Hi Frank,

Author of "the" node-neo4j here.

https://github.com/thingdom/node-neo4j

Unfortunately, `npm install node-neo4j` is *not* this driver. It's a different one. "This" node-neo4j is `npm install neo4j`. The version you want is indeed 2.0.0-RC1.

https://www.npmjs.com/package/neo4j

You'll need to change your code from `new neo4j(...)` to `new neo4j.GraphDatabase(...)`, and from `db.cypherQuery` to `db.cypher`. Full API docs here for now:

https://github.com/thingdom/node-neo4j/blob/v2/API_v2.md

Now for the behavior you're seeing where a new connection is being made for every query: that's really odd. What version of Node.js are you running? Node 0.10 and up use Keep-Alive by default under load, so you should not see that there:

http://nodejs.org/dist/v0.10.36/docs/api/http.html#http_class_http_agent

And Node 0.12 and io.js improve this support:

https://iojs.org/api/http.html#http_class_http_agent

Nonetheless, node-neo4j v2 does let you pass your own custom http.Agent, so you can control the connection pooling yourself if you like. E.g.:

var http = require('http');
var db = new neo4j.GraphDatabase({
url: 'http://...',
agent: new http.Agent({...}),
});

We run node-neo4j v2 on Node 0.10 in production at FiftyThree and are pretty satisfied with the performance under load. (We use a custom agent to isolate its connection pooling, and have its maxSockets currently set to 20 per Node process.) But perhaps you're exercising something different that we're not aware of.

Hope this helps though.

Cheers,
Aseem

On Monday, June 8, 2015 at 3:12:02 AM UTC-4, Frank Celler wrote:
I changed the index to a constraint and updated the page-cache.

However, I'm still struggling with the node.js driver. I've tried the "node-neo4j", which you get in version 2.0.3 using "npm install node-neo4j". I've created the database link using

    var db = new neo4j('http://neo4j:abc@' + host + ':7474');

but when running a lot of db.cypherQuery, I ended up with a lot of connections in TIME_WAIT:

    $ netstat -anpt | fgrep TIME_WAIT | wc
    1014    7098   98358

So, it seems that connections are not keep open. Is there a way to specify this? For example, the MongoDB driver has a 'poolSize' argument, to specify how many connections should be keep open.

Thanks for your help
Frank

Am Sonntag, 7. Juni 2015 14:25:34 UTC+2 schrieb Michael Hunger:
Hi,

It would have been very nice to be contacted before such an article went out and not called out as part of the post to "defend yourself". Just saying.

Seraph uses old and outdated, 2-year old APIs (/db/data/cypher and /db/data/node) which are not performant
and also misses relevant headers (e.g. X-Stream:true) for those.
It also doesn't support http keep-alive.

I would either use requests directly or perhaps node-neo4j 2.x, would have to test though.

Configuration for Neo4j also easy to improve, for your store 2.5G page-cache memory should be enough.
The warmup is also not sufficient.

And running the queries once, i.e. cold caches are also a non-production approach.

I'm currently looking into it and will post an blog post with my recommendations next week.

As we all know benchmark tests are always well suited to the publisher :)

The index should be a unique constraint instead.

Cheers, Michael

Am 07.06.2015 um 12:33 schrieb Frank Celler <[email protected]>:

Hi Christophe,

I'm Frank from ArangoDB. The author of the article, Claudius, is my colleague - he currently not at his computer. Therefore, I try to answer your questions. Please let me know, if you need more information. Any help with the queries is more than welcome. If we can improve them in any way, please let us know.

- we raised the ulimit as requested by neo4j when it started: open files (-n) 40000

- there is one index on PROFILES:

neo4j-sh (?)$ schema
Indexes
ON :PROFILES(_key) ONLINE

- as far as we understood, there is no need to create an index for edges

- we used "seraph" as node.js driver, because that was recommend in the node user group

- we set

dbms.pagecache.memory=20g

(we were told in talk, that this is nowadays the only cache parameter that matters).

- we started with

./bin/neo4j start

- JVM is

java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

Thanks for your help
Frank

Am Freitag, 5. Juni 2015 19:25:09 UTC+2 schrieb Christophe Willemsen:
I have looked at their repository too. Most of the queries seems 'almost' correct, but there is no information concerning the real schema indexes, the configuration of the JVM etc.., also the results are the throughput so I wait for someone maybe more experimented in these kind of benchmarks in order to reply to it.

Le vendredi 5 juin 2015 04:32:59 UTC+2, Michael Hunger a écrit :
I'm currently on the road but there are several things wrong with it. Will look into more detail in the next few days

Michael

Von meinem iPhone gesendet

Am 04.06.2015 um 12:57 schrieb Andrii Stesin <[email protected]>:

Just ran into the following article (published supposedly today Jun 04, 2015) which claims to contain comparison of benchmark results: Native multi-model can compete with pure document and graph databases which makes me think that there is something wrong with either their data model or with test setup, because results for Neo4j are surprisingly low.

Am I the only one out there who feel the same?

WBR,
Andrii

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] ArangoDB vs. Neo4j -- what's up? article of Jun 04, 2015

Reply via email to