Re: [Neo4j] ArangoDB vs. Neo4j -- what's up? article of Jun 04, 2015

Frank Celler Fri, 12 Jun 2015 07:22:57 -0700

Hallo Michael,

 - default transaction mode is per document tx


yes, that is true. The other databases (MongoDB, OrientDB, Postgres) behave 
in a similar way to ArangoDB. I had assumed that this is also the default 
behavior when sending a cypher create - namely that it creates an implicit 
transaction. If that assumption is wrong, then we need to update the neo4j 
driver to use the same semantics.

 - default consistency mode is eventual

Eventually consistency would only matter in a sharded environment. We have 
configured MongoDB and ArangoDB to sync at least every 1sec. 

 - arangodb uses in-memory indexes

yes, ArangoDB and MongoDB use in-memory indexes.

 - looking at the (long running) arangodb warmup code, it aggressively 
pulls all data into memory

we tried to add code for each DB to allow it to pull the working set into 
memory. I've also added your warmup code, which gives a significant 
performance boost. Currently, the warmup is 105sec for ArangoDB, 38sec for 
Neo4J and 50sec for MongoDB.

As soon as you have sorted out the other tests with Aseem, I will rerun the 
complete test. The machine has 60 GB RAM, so if there any suitable JVM 
parameter, please let me know.

 , 
are these statements that correct?

 I'm just wondering as Neo4j is a fully transactional database and 
guarantees sync to 
disk on commit if that's not apples and oranges comparison.

Neo4J only returns the write request if the changed data has been synced to 
disk, correct?

If there is no way to switch this off, then we indeed need to split this 
test. ArangoDB can do both, MongoDB can only do the sync every second. 
There are options in MongoDB for the client to explicitly wait for a sync 
to happen. I'll try to find out, how to set up two tests.

 Also to save RAM I didn't pull all the data into in-memory caches in 2.2

The machine has enough memory. More memory aggressive JVM parameters are 
totally OK.

 
 And I didn't understand the first degree / second degree neighbour 
reasoning in the 2nd blog post.
Why there are not 2 bars shown in the chart but that they are merged 
together (how?)

The question asked was: "what are the distance 1 and 2 friends" or putting 
it in another way: "what are all the different nodes which can be reached 
with a path of length 1 or 2".


 Also I noted that the chart at the beginning of the second blog post is 
exactly the same as the one in the first, despite stating above "this was 
updated with the latest improvements".

 That was for "update 4". With the new warmup the shortest path is really 
fast. For some reason the aggregation is much slower. Therefore I wanted to 
wait until you are finished.

Thanks for all the improvements
  Frank

 Michael, curious



Am Freitag, 12. Juni 2015 14:29:26 UTC+2 schrieb Michael Hunger:
>
> Hi Frank,
>
>  something that occurred to my when reading up on ArangoDB:
>
> - default transaction mode is per document tx
> - default consistency mode is eventual
> - arangodb uses in-memory indexes
> - looking at the (long running) arangodb warmup code, it aggressively 
> pulls all data into memory, 
>
> are these statements that correct?
>
> I'm just wondering as Neo4j is a fully transactional database and 
> guarantees sync to 
> disk on commit if that's not apples and oranges comparison.
>
> Also to save RAM I didn't pull all the data into in-memory caches in 2.2
>
> And I didn't understand the first degree / second degree neighbour 
> reasoning in the 2nd blog post.
> Why there are not 2 bars shown in the chart but that they are merged 
> together (how?)
>
> Also I noted that the chart at the beginning of the second blog post is 
> exactly the same as the one in the first, despite stating above "this was 
> updated with the latest improvements".
>
> Michael, curious
>
> Am 12.06.2015 um 12:19 schrieb Frank Celler <[email protected] 
> <javascript:>>:
>
> I've updated the database dump on Amazon S3 following Michael's suggestion. I 
> will rerun the tests as soon as Michael has finished his investigation.
>
>
> Best,
>
>   Frank
>
>
> Am Donnerstag, 11. Juni 2015 17:01:20 UTC+2 schrieb Frank Celler:
>>
>> It worked perfectly. 
>>
>> Am Donnerstag, 11. Juni 2015 15:36:30 UTC+2 schrieb Michael Hunger:
>>>
>>> you forgot --id-type integer
>>>
>>> the script actually takes care of it
>>>
>>> Am 11.06.2015 um 14:55 schrieb Michael Hunger <
>>> [email protected]>:
>>>
>>> I used both 2.2.2 and 2.3-M02 and 2.3-SNAPSHOT for the import.
>>>
>>> I can also provide you with the freshly imported databases. Let me know.
>>>
>>> Michael
>>>
>>> Am 11.06.2015 um 14:45 schrieb Frank Celler <[email protected]>:
>>>
>>> Hi Michael,
>>>
>>> thanks a lot for the import script. I'm currently trying to generate a 
>>> new database dump (with Neo4J 2.2.2 Community). But I get the following 
>>> error:
>>>
>>> $ bash -x ./import.sh 
>>> ...
>>> + rm -rf pokec.db
>>> + ./bin/neo4j-import --into pokec.db --id-type --delimiter TAB --quote Ö 
>>> --nodes:PROFILES 
>>> profiles_header.txt,soc-pokec-profiles_no_null_sorted.txt.gz 
>>> --relationships:RELATION 
>>> relationships_header.txt,soc-pokec-relationships.txt.gz
>>> Exception in thread "main" java.lang.NullPointerException
>>>     at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:575)
>>>     at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:571)
>>>     at org.neo4j.helpers.Args.interpretOption(Args.java:490)
>>>     at org.neo4j.tooling.ImportTool.main(ImportTool.java:282)
>>>     at org.neo4j.tooling.ImportTool.main(ImportTool.java:244)
>>>
>>> My java is
>>>
>>> java version "1.8.0_45"
>>> Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
>>> Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
>>>
>>> Do I need 2.3 for the import?
>>>
>>> Thanks
>>>   Frank
>>>
>>> Am Donnerstag, 11. Juni 2015 13:40:55 UTC+2 schrieb Michael Hunger:
>>>>
>>>> I created an import script which I added to my repository.
>>>> On my machine it imports the data in 35 seconds.
>>>>
>>>> Which uses more sensible types for the fields and also skips all the 
>>>> null values.
>>>>
>>>> It also uses a numeric id for the primary key which makes more sense to 
>>>> me.
>>>>
>>>> If you optimized the dataset for Neo4j you could even use the node-id 
>>>> as primary-key as the input data has a sane, incrementing id then it would 
>>>> be way faster.
>>>>
>>>> I also added a neo4j-pokec directory with queries to use that numeric 
>>>> id as input (probably should also use a input.json file that doesn't 
>>>> contains "Pxxx" strings, not sure what the perf impact is of converting 
>>>> those strings).
>>>>
>>>> Cheers, Michael
>>>>
>>>> https://github.com/jexp/nosql-tests/tree/my-import
>>>>
>>>> I did some preliminary testing
>>>>
>>>> Neo4j 2.2
>>>>
>>>> node benchmark.js neo4j-pokec -t 
>>>> shortest,neighbors,neighbors2,aggregation,singleRead 
>>>> INFO using server address  127.0.0.1
>>>> INFO start
>>>> INFO executing shortest path for 19 paths
>>>> INFO total paths length: 104
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO Neo4J: *shortest* path, 19 items
>>>> INFO Total Time for 19 requests: 85 ms
>>>> INFO Average: *4.47 ms*
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO executing neighbors for 500 elements
>>>> INFO total number of neighbors found: 9102
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO Neo4J: *neighbors*, 500 items
>>>> INFO Total Time for 500 requests: 428 ms
>>>> INFO Average: *0.86 ms*
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO executing neighbors 2nd degree for 500 elements
>>>> INFO total number of neighbors2 found: 545530
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO Neo4J: *neighbors2*, 500 items
>>>> INFO Total Time for 500 requests: 4850 ms
>>>> INFO Average: *9.7 ms*
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO executing aggregation
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO Neo4J: *aggregate*, 1 items
>>>> INFO Total Time for 1 requests: 14036 ms
>>>> INFO Average: *14036 ms*
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO executing single read with 100000 documents
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO Neo4J: *single reads*, 100000 items
>>>> INFO Total Time for 100000 requests: 83473 ms
>>>> INFO Average: *0.83 ms*
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>>
>>>>
>>>> Neo4j 2.3
>>>>
>>>>  node benchmark.js neo4j-pokec -t 
>>>> shortest,neighbors,neighbors2,aggregation,singleRead 
>>>> INFO using server address  127.0.0.1
>>>> INFO start
>>>> INFO executing shortest path for 19 paths
>>>> INFO total paths length: 104
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO Neo4J: *shortest* path, 19 items
>>>> INFO Total Time for 19 requests: 69 ms
>>>> INFO Average: *3.63 ms*
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO executing neighbors for 500 elements
>>>> INFO total number of neighbors found: 9102
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO Neo4J: *neighbors*, 500 items
>>>> INFO Total Time for 500 requests: 431 ms
>>>> INFO Average: *0.86 ms*
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO executing neighbors 2nd degree for 500 elements
>>>> INFO total number of neighbors2 found: 545530
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO Neo4J: *neighbors2*, 500 items
>>>> INFO Total Time for 500 requests: 3441 ms
>>>> INFO Average: *6.88 ms*
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO executing aggregation
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO Neo4J: *aggregate*, 1 items
>>>> INFO Total Time for 1 requests: 2848 ms
>>>> INFO Average: *2848 ms*
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO executing single read with 100000 documents
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> INFO Neo4J: *single reads*, 100000 items
>>>> INFO Total Time for 100000 requests: 77760 ms
>>>> INFO Average: *0.78 ms*
>>>> INFO 
>>>> -----------------------------------------------------------------------------
>>>> DONE
>>>>
>>>>
>>>> Am 10.06.2015 um 18:55 schrieb Frank Celler <[email protected]>:
>>>>
>>>> Hi Michael,
>>>>
>>>> thanks for sharing your preliminary findings. I'll incorporate them 
>>>> into the benchmark suite and rerun the tests. I've seen that there is a 
>>>> 30day trial for the enterprise edition. So I can tests that as well.
>>>>
>>>> Is it possible to upload the database where you changed the AGE 
>>>> attribute? Or is there any easy cypher command to change the type?
>>>>
>>>> Thanks
>>>>   Frank
>>>>
>>>>
>>>> Am Mittwoch, 10. Juni 2015 17:27:05 UTC+2 schrieb Michael Hunger:
>>>>>
>>>>> I also did some experiments but didn't have the time to finish yet, 
>>>>> here are my observations so far:
>>>>>
>>>>> *Arangodb Measurement*
>>>>>
>>>>> - index -> constraint `CREATE CONSTRAINT ON (p:PROFILES) ASSERT p._key 
>>>>> IS UNIQUE;`
>>>>> - seraph -> replace with node-neo4j 2.0.RC1 
>>>>>   - uses 2 year old /cypher api, doesn't send X-Stream:true header
>>>>>   - does not do efficient auth (encode creds on every call)
>>>>>   - doesn't do pooling
>>>>> - suboptimal queries
>>>>> - make sure the concurrency level is adequate for the setup (utilize 
>>>>> all cores but don't flood, use e.g. async.eachWithLimit)
>>>>> - warmup with nodes and rels `MATCH ()--() return count(*);`
>>>>> - enterprise with better vertical read/write scalability vs. community
>>>>> - Use 12G-24G heap, 2G new gen (-Xmn2G)
>>>>> - pagecache to 2.5G + growth (e.g. another 2.5G)
>>>>> - in 2.2 set cache_type = soft or cache_type=none depending on 
>>>>> available heap
>>>>> - fix property encoding, e.g. AGE as int not string, don't store 
>>>>> "null" !!
>>>>>   -> affects esp. aggregate query
>>>>> - don't re-run the benchmark on the same store, start at the initial 
>>>>> one
>>>>>   -> creating and deleting the additional PROFILES_TEMP nodes affects 
>>>>> repeatability of results
>>>>>
>>>>> correct datatypes:
>>>>>
>>>>> * "null" should *never be stored*
>>>>> * int: public, gender, completion_percentage, AGE,
>>>>> * long/time: last_login, registration 
>>>>> * optionally as label: gender, public
>>>>>
>>>>>   -> test repository (WIP): with changes in *description.js and 
>>>>> benchmark.js*
>>>>>
>>>>> https://github.com/jexp/nosql-tests/tree/node-neo4j
>>>>>
>>>>> queries for  for neo4j-shell:
>>>>>
>>>>> export from="P/P1"
>>>>> export to="P/P277"
>>>>>
>>>>> export key="P/P1"
>>>>>
>>>>> // warmup
>>>>> MATCH ()--() return count(*);
>>>>> // 61.245.128 rows
>>>>>
>>>>> MATCH (s:PROFILES) return count(*);
>>>>> // 1.632.803 profiles
>>>>> // 1.15 s
>>>>>
>>>>> profile
>>>>>
>>>>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) RETURN DISTINCT 
>>>>> n._key;
>>>>> // 295 rows 5 ms
>>>>>
>>>>>
>>>>> // 1st degree neighbours
>>>>> MATCH (:PROFILES {_key:{key}})-->(n) RETURN n._key;
>>>>> // 14 rows 1ms 
>>>>>
>>>>> // 2nd degree neighbours
>>>>> MATCH (s:PROFILES {_key:{key}})-->(x)
>>>>> MATCH (x)-->(n:PROFILES)
>>>>> RETURN DISTINCT n._key;
>>>>> // 283 rows 6 ms
>>>>>
>>>>> // shortest path
>>>>> MATCH (s:PROFILES {_key:{from}}),(t:PROFILES {_key:{to}}), 
>>>>> p = shortestPath((s)-[*..15]->(t)) RETURN [x in nodes(p) | x._key] as 
>>>>> path;
>>>>> // 1 ms, don't return the full data only keys like in the other db's
>>>>>
>>>>> // aggregation
>>>>> MATCH (f:PROFILES) RETURN f.AGE, count(*);
>>>>> // 22s -> should be rather 1.5s
>>>>>
>>>>> // single read
>>>>> MATCH (f:PROFILES) WHERE f._key = {key} RETURN f;
>>>>> // or
>>>>> MATCH (s:PROFILES {_key:{key}}) RETURN s;
>>>>> // 1 row with 59 properties 1 ms
>>>>>
>>>>> // single writes
>>>>> CREATE (s:PROFILES_TEMP {data}) RETURN id(s);
>>>>>
>>>>> // delete all nodes with a certain label
>>>>> // loop until returns 0
>>>>> MATCH (n:PROFILES_TEMP) WITH n LIMIT 5000 OPTIONAL MATCH (n)-[r]-() 
>>>>> DELETE n,r RETURN count(*) as deleted
>>>>> ----
>>>>>
>>>>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) WITH DISTINCT 
>>>>> n._key as key RETURN count(*);
>>>>> // 295 count 5-6ms
>>>>>
>>>>> MATCH (f:PROFILES) return id(f) % 140, count(*);
>>>>> // 140 rows -> 1502 ms that's how it should be
>>>>>
>>>>> sample data:
>>>>>
>>>>> _key:"P/P1",
>>>>> public:"1",
>>>>> completion_percentage:"14",
>>>>> gender:"1",
>>>>> region:"zilinsky kraj, zilina",
>>>>> last_login:"2012-05-25 11:20:00.0",
>>>>> registration:"2005-04-03 00:00:00.0",
>>>>> AGE:26,
>>>>> body:"185 cm, 90 kg",
>>>>> I_am_working_in_field:"it",
>>>>> spoken_languages:"anglicky",
>>>>> hobbies:"sportovanie, spanie, kino, jedlo, pocuvanie hudby, priatelia, 
>>>>> divadlo",
>>>>> I_most_enjoy_good_food:"v dobrej restauracii",
>>>>> pets:"mam psa",
>>>>> body_type:"null",
>>>>> my_eyesight:"null",
>>>>> eye_color:"null",
>>>>> hair_color:"null",
>>>>> hair_type:"null",
>>>>> completed_level_of_education:"null",
>>>>> favourite_color:"null",
>>>>> relation_to_smoking:"null",
>>>>> relation_to_alcohol:"null",
>>>>> sign_in_zodiac:"null",
>>>>> on_pokec_i_am_looking_for:"null",
>>>>> love_is_for_me:"null",
>>>>> relation_to_casual_sex:"null",
>>>>> my_partner_should_be:"null",
>>>>> marital_status:"null",
>>>>> children:"null",
>>>>> relation_to_children:"null",
>>>>> I_like_movies:"null",
>>>>> I_like_watching_movie:"null",
>>>>> I_like_music:"null",
>>>>> I_mostly_like_listening_to_music:"null",
>>>>> the_idea_of_good_evening:"null",
>>>>> I_like_specialties_from_kitchen:"null",
>>>>> fun:"null",
>>>>> I_am_going_to_concerts:"null",
>>>>> my_active_sports:"null",
>>>>> my_passive_sports:"null",
>>>>> profession:"null",
>>>>> I_like_books:"null",
>>>>> life_style:"null",
>>>>> music:"null",
>>>>> cars:"null",
>>>>> politics:"null",
>>>>> relationships:"null",
>>>>> art_culture:"null",
>>>>> hobbies_interests:"null",
>>>>> science_technologies:"null",
>>>>> computers_internet:"null",
>>>>> education:"null",
>>>>> sport:"null",
>>>>> movies:"null",
>>>>> travelling:"null",
>>>>> health:"null",
>>>>> companies_brands:"null",
>>>>> more:"null"
>>>>>
>>>>>
>>>>> neo4j-server.properties:
>>>>> org.neo4j.server.database.location=/Users/mh/support/arangodb/db/data
>>>>> org.neo4j.server.webserver.port=8474
>>>>> dbms.security.auth_enabled=false
>>>>>
>>>>>
>>>>> neo4j-wrapper.conf:
>>>>> wrapper.java.initmemory=8000
>>>>> wrapper.java.maxmemory=8000
>>>>> wrapper.java.additional=-Xmn2G
>>>>>
>>>>> neo4j.properties:
>>>>> dbms.pagecache.memory=5G
>>>>> keep_logical_logs=false
>>>>> remote_shell_enabled=false
>>>>> cache_type=soft
>>>>> online_backup_enabled=false
>>>>>
>>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
>>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] ArangoDB vs. Neo4j -- what's up? article of Jun 04, 2015

Reply via email to