Re: [Neo4j] ArangoDB vs. Neo4j -- what's up? article of Jun 04, 2015

Michael Hunger Fri, 12 Jun 2015 05:30:03 -0700

Hi Frank,

 something that occurred to my when reading up on ArangoDB:


- default transaction mode is per document tx
- default consistency mode is eventual
- arangodb uses in-memory indexes
- looking at the (long running) arangodb warmup code, it aggressively pulls all 
data into memory, 

are these statements that correct?

I'm just wondering as Neo4j is a fully transactional database and guarantees 
sync to 
disk on commit if that's not apples and oranges comparison.

Also to save RAM I didn't pull all the data into in-memory caches in 2.2

And I didn't understand the first degree / second degree neighbour reasoning in 
the 2nd blog post.
Why there are not 2 bars shown in the chart but that they are merged together 
(how?)

Also I noted that the chart at the beginning of the second blog post is exactly 
the same as the one in the first, despite stating above "this was updated with 
the latest improvements".

Michael, curious

> Am 12.06.2015 um 12:19 schrieb Frank Celler <[email protected]>:
> 
> I've updated the database dump on Amazon S3 following Michael's suggestion. I 
> will rerun the tests as soon as Michael has finished his investigation.
> 
> Best,
>   Frank
> 
> Am Donnerstag, 11. Juni 2015 17:01:20 UTC+2 schrieb Frank Celler:
> It worked perfectly. 
> 
> Am Donnerstag, 11. Juni 2015 15:36:30 UTC+2 schrieb Michael Hunger:
> you forgot --id-type integer
> 
> the script actually takes care of it
> 
>> Am 11.06.2015 um 14:55 schrieb Michael Hunger <[email protected] 
>> <>>:
>> 
>> I used both 2.2.2 and 2.3-M02 and 2.3-SNAPSHOT for the import.
>> 
>> I can also provide you with the freshly imported databases. Let me know.
>> 
>> Michael
>> 
>>> Am 11.06.2015 um 14:45 schrieb Frank Celler <[email protected] <>>:
>>> 
>>> Hi Michael,
>>> 
>>> thanks a lot for the import script. I'm currently trying to generate a new 
>>> database dump (with Neo4J 2.2.2 Community). But I get the following error:
>>> 
>>> $ bash -x ./import.sh 
>>> ...
>>> + rm -rf pokec.db
>>> + ./bin/neo4j-import --into pokec.db --id-type --delimiter TAB --quote Ö 
>>> --nodes:PROFILES 
>>> profiles_header.txt,soc-pokec-profiles_no_null_sorted.txt.gz 
>>> --relationships:RELATION 
>>> relationships_header.txt,soc-pokec-relationships.txt.gz
>>> Exception in thread "main" java.lang.NullPointerException
>>>     at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:575)
>>>     at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:571)
>>>     at org.neo4j.helpers.Args.interpretOption(Args.java:490)
>>>     at org.neo4j.tooling.ImportTool.main(ImportTool.java:282)
>>>     at org.neo4j.tooling.ImportTool.main(ImportTool.java:244)
>>> 
>>> My java is
>>> 
>>> java version "1.8.0_45"
>>> Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
>>> Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
>>> 
>>> Do I need 2.3 for the import?
>>> 
>>> Thanks
>>>   Frank
>>> 
>>> Am Donnerstag, 11. Juni 2015 13:40:55 UTC+2 schrieb Michael Hunger:
>>> I created an import script which I added to my repository.
>>> On my machine it imports the data in 35 seconds.
>>> 
>>> Which uses more sensible types for the fields and also skips all the null 
>>> values.
>>> 
>>> It also uses a numeric id for the primary key which makes more sense to me.
>>> 
>>> If you optimized the dataset for Neo4j you could even use the node-id as 
>>> primary-key as the input data has a sane, incrementing id then it would be 
>>> way faster.
>>> 
>>> I also added a neo4j-pokec directory with queries to use that numeric id as 
>>> input (probably should also use a input.json file that doesn't contains 
>>> "Pxxx" strings, not sure what the perf impact is of converting those 
>>> strings).
>>> 
>>> Cheers, Michael
>>> 
>>> https://github.com/jexp/nosql-tests/tree/my-import 
>>> <https://github.com/jexp/nosql-tests/tree/my-import>
>>> 
>>> I did some preliminary testing
>>> 
>>> Neo4j 2.2
>>> 
>>> node benchmark.js neo4j-pokec -t 
>>> shortest,neighbors,neighbors2,aggregation,singleRead 
>>> INFO using server address  127.0.0.1
>>> INFO start
>>> INFO executing shortest path for 19 paths
>>> INFO total paths length: 104
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: shortest path, 19 items
>>> INFO Total Time for 19 requests: 85 ms
>>> INFO Average: 4.47 ms
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing neighbors for 500 elements
>>> INFO total number of neighbors found: 9102
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: neighbors, 500 items
>>> INFO Total Time for 500 requests: 428 ms
>>> INFO Average: 0.86 ms
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing neighbors 2nd degree for 500 elements
>>> INFO total number of neighbors2 found: 545530
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: neighbors2, 500 items
>>> INFO Total Time for 500 requests: 4850 ms
>>> INFO Average: 9.7 ms
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing aggregation
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: aggregate, 1 items
>>> INFO Total Time for 1 requests: 14036 ms
>>> INFO Average: 14036 ms
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing single read with 100000 documents
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: single reads, 100000 items
>>> INFO Total Time for 100000 requests: 83473 ms
>>> INFO Average: 0.83 ms
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> 
>>> 
>>> Neo4j 2.3
>>> 
>>>  node benchmark.js neo4j-pokec -t 
>>> shortest,neighbors,neighbors2,aggregation,singleRead 
>>> INFO using server address  127.0.0.1
>>> INFO start
>>> INFO executing shortest path for 19 paths
>>> INFO total paths length: 104
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: shortest path, 19 items
>>> INFO Total Time for 19 requests: 69 ms
>>> INFO Average: 3.63 ms
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing neighbors for 500 elements
>>> INFO total number of neighbors found: 9102
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: neighbors, 500 items
>>> INFO Total Time for 500 requests: 431 ms
>>> INFO Average: 0.86 ms
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing neighbors 2nd degree for 500 elements
>>> INFO total number of neighbors2 found: 545530
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: neighbors2, 500 items
>>> INFO Total Time for 500 requests: 3441 ms
>>> INFO Average: 6.88 ms
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing aggregation
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: aggregate, 1 items
>>> INFO Total Time for 1 requests: 2848 ms
>>> INFO Average: 2848 ms
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing single read with 100000 documents
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: single reads, 100000 items
>>> INFO Total Time for 100000 requests: 77760 ms
>>> INFO Average: 0.78 ms
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> DONE
>>> 
>>> 
>>>> Am 10.06.2015 um 18:55 schrieb Frank Celler <fce...@ <>gmail.com 
>>>> <http://gmail.com/>>:
>>>> 
>>>> Hi Michael,
>>>> 
>>>> thanks for sharing your preliminary findings. I'll incorporate them into 
>>>> the benchmark suite and rerun the tests. I've seen that there is a 30day 
>>>> trial for the enterprise edition. So I can tests that as well.
>>>> 
>>>> Is it possible to upload the database where you changed the AGE attribute? 
>>>> Or is there any easy cypher command to change the type?
>>>> 
>>>> Thanks
>>>>   Frank
>>>> 
>>>> 
>>>> Am Mittwoch, 10. Juni 2015 17:27:05 UTC+2 schrieb Michael Hunger:
>>>> I also did some experiments but didn't have the time to finish yet, here 
>>>> are my observations so far:
>>>> 
>>>> Arangodb Measurement
>>>> 
>>>> - index -> constraint `CREATE CONSTRAINT ON (p:PROFILES) ASSERT p._key IS 
>>>> UNIQUE;`
>>>> - seraph -> replace with node-neo4j 2.0.RC1 
>>>>   - uses 2 year old /cypher api, doesn't send X-Stream:true header
>>>>   - does not do efficient auth (encode creds on every call)
>>>>   - doesn't do pooling
>>>> - suboptimal queries
>>>> - make sure the concurrency level is adequate for the setup (utilize all 
>>>> cores but don't flood, use e.g. async.eachWithLimit)
>>>> - warmup with nodes and rels `MATCH ()--() return count(*);`
>>>> - enterprise with better vertical read/write scalability vs. community
>>>> - Use 12G-24G heap, 2G new gen (-Xmn2G)
>>>> - pagecache to 2.5G + growth (e.g. another 2.5G)
>>>> - in 2.2 set cache_type = soft or cache_type=none depending on available 
>>>> heap
>>>> - fix property encoding, e.g. AGE as int not string, don't store "null" !!
>>>>   -> affects esp. aggregate query
>>>> - don't re-run the benchmark on the same store, start at the initial one
>>>>   -> creating and deleting the additional PROFILES_TEMP nodes affects 
>>>> repeatability of results
>>>> 
>>>> correct datatypes:
>>>> 
>>>> * "null" should *never be stored*
>>>> * int: public, gender, completion_percentage, AGE,
>>>> * long/time: last_login, registration 
>>>> * optionally as label: gender, public
>>>> 
>>>>   -> test repository (WIP): with changes in description.js and benchmark.js
>>>> 
>>>> https://github.com/jexp/nosql-tests/tree/node-neo4j 
>>>> <https://github.com/jexp/nosql-tests/tree/node-neo4j>
>>>> 
>>>> queries for  for neo4j-shell:
>>>> 
>>>> export from="P/P1"
>>>> export to="P/P277"
>>>> 
>>>> export key="P/P1"
>>>> 
>>>> // warmup
>>>> MATCH ()--() return count(*);
>>>> // 61.245.128 rows
>>>> 
>>>> MATCH (s:PROFILES) return count(*);
>>>> // 1.632.803 profiles
>>>> // 1.15 s
>>>> 
>>>> profile
>>>> 
>>>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) RETURN DISTINCT 
>>>> n._key;
>>>> // 295 rows 5 ms
>>>> 
>>>> 
>>>> // 1st degree neighbours
>>>> MATCH (:PROFILES {_key:{key}})-->(n) RETURN n._key;
>>>> // 14 rows 1ms 
>>>> 
>>>> // 2nd degree neighbours
>>>> MATCH (s:PROFILES {_key:{key}})-->(x)
>>>> MATCH (x)-->(n:PROFILES)
>>>> RETURN DISTINCT n._key;
>>>> // 283 rows 6 ms
>>>> 
>>>> // shortest path
>>>> MATCH (s:PROFILES {_key:{from}}),(t:PROFILES {_key:{to}}), 
>>>> p = shortestPath((s)-[*..15]->(t)) RETURN [x in nodes(p) | x._key] as path;
>>>> // 1 ms, don't return the full data only keys like in the other db's
>>>> 
>>>> // aggregation
>>>> MATCH (f:PROFILES) RETURN f.AGE, count(*);
>>>> // 22s -> should be rather 1.5s
>>>> 
>>>> // single read
>>>> MATCH (f:PROFILES) WHERE f._key = {key} RETURN f;
>>>> // or
>>>> MATCH (s:PROFILES {_key:{key}}) RETURN s;
>>>> // 1 row with 59 properties 1 ms
>>>> 
>>>> // single writes
>>>> CREATE (s:PROFILES_TEMP {data}) RETURN id(s);
>>>> 
>>>> // delete all nodes with a certain label
>>>> // loop until returns 0
>>>> MATCH (n:PROFILES_TEMP) WITH n LIMIT 5000 OPTIONAL MATCH (n)-[r]-() DELETE 
>>>> n,r RETURN count(*) as deleted
>>>> ----
>>>> 
>>>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) WITH DISTINCT n._key 
>>>> as key RETURN count(*);
>>>> // 295 count 5-6ms
>>>> 
>>>> MATCH (f:PROFILES) return id(f) % 140, count(*);
>>>> // 140 rows -> 1502 ms that's how it should be
>>>> 
>>>> sample data:
>>>> 
>>>> _key:"P/P1",
>>>> public:"1",
>>>> completion_percentage:"14",
>>>> gender:"1",
>>>> region:"zilinsky kraj, zilina",
>>>> last_login:"2012-05-25 11:20:00.0",
>>>> registration:"2005-04-03 00:00:00.0",
>>>> AGE:26,
>>>> body:"185 cm, 90 kg",
>>>> I_am_working_in_field:"it",
>>>> spoken_languages:"anglicky",
>>>> hobbies:"sportovanie, spanie, kino, jedlo, pocuvanie hudby, priatelia, 
>>>> divadlo",
>>>> I_most_enjoy_good_food:"v dobrej restauracii",
>>>> pets:"mam psa",
>>>> body_type:"null",
>>>> my_eyesight:"null",
>>>> eye_color:"null",
>>>> hair_color:"null",
>>>> hair_type:"null",
>>>> completed_level_of_education:"null",
>>>> favourite_color:"null",
>>>> relation_to_smoking:"null",
>>>> relation_to_alcohol:"null",
>>>> sign_in_zodiac:"null",
>>>> on_pokec_i_am_looking_for:"null",
>>>> love_is_for_me:"null",
>>>> relation_to_casual_sex:"null",
>>>> my_partner_should_be:"null",
>>>> marital_status:"null",
>>>> children:"null",
>>>> relation_to_children:"null",
>>>> I_like_movies:"null",
>>>> I_like_watching_movie:"null",
>>>> I_like_music:"null",
>>>> I_mostly_like_listening_to_music:"null",
>>>> the_idea_of_good_evening:"null",
>>>> I_like_specialties_from_kitchen:"null",
>>>> fun:"null",
>>>> I_am_going_to_concerts:"null",
>>>> my_active_sports:"null",
>>>> my_passive_sports:"null",
>>>> profession:"null",
>>>> I_like_books:"null",
>>>> life_style:"null",
>>>> music:"null",
>>>> cars:"null",
>>>> politics:"null",
>>>> relationships:"null",
>>>> art_culture:"null",
>>>> hobbies_interests:"null",
>>>> science_technologies:"null",
>>>> computers_internet:"null",
>>>> education:"null",
>>>> sport:"null",
>>>> movies:"null",
>>>> travelling:"null",
>>>> health:"null",
>>>> companies_brands:"null",
>>>> more:"null"
>>>> 
>>>> 
>>>> neo4j-server.properties:
>>>> org.neo4j.server.database.location=/Users/mh/support/arangodb/db/data
>>>> org.neo4j.server.webserver.port=8474
>>>> dbms.security.auth_enabled=false
>>>> 
>>>> 
>>>> neo4j-wrapper.conf:
>>>> wrapper.java.initmemory=8000
>>>> wrapper.java.maxmemory=8000
>>>> wrapper.java.additional=-Xmn2G
>>>> 
>>>> neo4j.properties:
>>>> dbms.pagecache.memory=5G
>>>> keep_logical_logs=false
>>>> remote_shell_enabled=false
>>>> cache_type=soft
>>>> online_backup_enabled=false
>>>> 
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to neo4j+un...@ <>googlegroups.com <http://googlegroups.com/>.
>>>> For more options, visit https://groups.google.com/d/optout 
>>>> <https://groups.google.com/d/optout>.
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected] <>.
>>> For more options, visit https://groups.google.com/d/optout 
>>> <https://groups.google.com/d/optout>.
>> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] ArangoDB vs. Neo4j -- what's up? article of Jun 04, 2015

Reply via email to