Re: [Neo4j] ArangoDB vs. Neo4j -- what's up? article of Jun 04, 2015

Michael Hunger Thu, 11 Jun 2015 06:37:20 -0700

you forgot --id-type integer

the script actually takes care of it


> Am 11.06.2015 um 14:55 schrieb Michael Hunger 
> <[email protected]>:
> 
> I used both 2.2.2 and 2.3-M02 and 2.3-SNAPSHOT for the import.
> 
> I can also provide you with the freshly imported databases. Let me know.
> 
> Michael
> 
>> Am 11.06.2015 um 14:45 schrieb Frank Celler <[email protected] 
>> <mailto:[email protected]>>:
>> 
>> Hi Michael,
>> 
>> thanks a lot for the import script. I'm currently trying to generate a new 
>> database dump (with Neo4J 2.2.2 Community). But I get the following error:
>> 
>> $ bash -x ./import.sh 
>> ...
>> + rm -rf pokec.db
>> + ./bin/neo4j-import --into pokec.db --id-type --delimiter TAB --quote Ö 
>> --nodes:PROFILES 
>> profiles_header.txt,soc-pokec-profiles_no_null_sorted.txt.gz 
>> --relationships:RELATION 
>> relationships_header.txt,soc-pokec-relationships.txt.gz
>> Exception in thread "main" java.lang.NullPointerException
>>     at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:575)
>>     at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:571)
>>     at org.neo4j.helpers.Args.interpretOption(Args.java:490)
>>     at org.neo4j.tooling.ImportTool.main(ImportTool.java:282)
>>     at org.neo4j.tooling.ImportTool.main(ImportTool.java:244)
>> 
>> My java is
>> 
>> java version "1.8.0_45"
>> Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
>> Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
>> 
>> Do I need 2.3 for the import?
>> 
>> Thanks
>>   Frank
>> 
>> Am Donnerstag, 11. Juni 2015 13:40:55 UTC+2 schrieb Michael Hunger:
>> I created an import script which I added to my repository.
>> On my machine it imports the data in 35 seconds.
>> 
>> Which uses more sensible types for the fields and also skips all the null 
>> values.
>> 
>> It also uses a numeric id for the primary key which makes more sense to me.
>> 
>> If you optimized the dataset for Neo4j you could even use the node-id as 
>> primary-key as the input data has a sane, incrementing id then it would be 
>> way faster.
>> 
>> I also added a neo4j-pokec directory with queries to use that numeric id as 
>> input (probably should also use a input.json file that doesn't contains 
>> "Pxxx" strings, not sure what the perf impact is of converting those 
>> strings).
>> 
>> Cheers, Michael
>> 
>> https://github.com/jexp/nosql-tests/tree/my-import 
>> <https://github.com/jexp/nosql-tests/tree/my-import>
>> 
>> I did some preliminary testing
>> 
>> Neo4j 2.2
>> 
>> node benchmark.js neo4j-pokec -t 
>> shortest,neighbors,neighbors2,aggregation,singleRead 
>> INFO using server address  127.0.0.1
>> INFO start
>> INFO executing shortest path for 19 paths
>> INFO total paths length: 104
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO Neo4J: shortest path, 19 items
>> INFO Total Time for 19 requests: 85 ms
>> INFO Average: 4.47 ms
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO executing neighbors for 500 elements
>> INFO total number of neighbors found: 9102
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO Neo4J: neighbors, 500 items
>> INFO Total Time for 500 requests: 428 ms
>> INFO Average: 0.86 ms
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO executing neighbors 2nd degree for 500 elements
>> INFO total number of neighbors2 found: 545530
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO Neo4J: neighbors2, 500 items
>> INFO Total Time for 500 requests: 4850 ms
>> INFO Average: 9.7 ms
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO executing aggregation
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO Neo4J: aggregate, 1 items
>> INFO Total Time for 1 requests: 14036 ms
>> INFO Average: 14036 ms
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO executing single read with 100000 documents
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO Neo4J: single reads, 100000 items
>> INFO Total Time for 100000 requests: 83473 ms
>> INFO Average: 0.83 ms
>> INFO 
>> -----------------------------------------------------------------------------
>> 
>> 
>> Neo4j 2.3
>> 
>>  node benchmark.js neo4j-pokec -t 
>> shortest,neighbors,neighbors2,aggregation,singleRead 
>> INFO using server address  127.0.0.1
>> INFO start
>> INFO executing shortest path for 19 paths
>> INFO total paths length: 104
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO Neo4J: shortest path, 19 items
>> INFO Total Time for 19 requests: 69 ms
>> INFO Average: 3.63 ms
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO executing neighbors for 500 elements
>> INFO total number of neighbors found: 9102
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO Neo4J: neighbors, 500 items
>> INFO Total Time for 500 requests: 431 ms
>> INFO Average: 0.86 ms
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO executing neighbors 2nd degree for 500 elements
>> INFO total number of neighbors2 found: 545530
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO Neo4J: neighbors2, 500 items
>> INFO Total Time for 500 requests: 3441 ms
>> INFO Average: 6.88 ms
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO executing aggregation
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO Neo4J: aggregate, 1 items
>> INFO Total Time for 1 requests: 2848 ms
>> INFO Average: 2848 ms
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO executing single read with 100000 documents
>> INFO 
>> -----------------------------------------------------------------------------
>> INFO Neo4J: single reads, 100000 items
>> INFO Total Time for 100000 requests: 77760 ms
>> INFO Average: 0.78 ms
>> INFO 
>> -----------------------------------------------------------------------------
>> DONE
>> 
>> 
>>> Am 10.06.2015 um 18:55 schrieb Frank Celler <fce...@ <>gmail.com 
>>> <http://gmail.com/>>:
>>> 
>>> Hi Michael,
>>> 
>>> thanks for sharing your preliminary findings. I'll incorporate them into 
>>> the benchmark suite and rerun the tests. I've seen that there is a 30day 
>>> trial for the enterprise edition. So I can tests that as well.
>>> 
>>> Is it possible to upload the database where you changed the AGE attribute? 
>>> Or is there any easy cypher command to change the type?
>>> 
>>> Thanks
>>>   Frank
>>> 
>>> 
>>> Am Mittwoch, 10. Juni 2015 17:27:05 UTC+2 schrieb Michael Hunger:
>>> I also did some experiments but didn't have the time to finish yet, here 
>>> are my observations so far:
>>> 
>>> Arangodb Measurement
>>> 
>>> - index -> constraint `CREATE CONSTRAINT ON (p:PROFILES) ASSERT p._key IS 
>>> UNIQUE;`
>>> - seraph -> replace with node-neo4j 2.0.RC1 
>>>   - uses 2 year old /cypher api, doesn't send X-Stream:true header
>>>   - does not do efficient auth (encode creds on every call)
>>>   - doesn't do pooling
>>> - suboptimal queries
>>> - make sure the concurrency level is adequate for the setup (utilize all 
>>> cores but don't flood, use e.g. async.eachWithLimit)
>>> - warmup with nodes and rels `MATCH ()--() return count(*);`
>>> - enterprise with better vertical read/write scalability vs. community
>>> - Use 12G-24G heap, 2G new gen (-Xmn2G)
>>> - pagecache to 2.5G + growth (e.g. another 2.5G)
>>> - in 2.2 set cache_type = soft or cache_type=none depending on available 
>>> heap
>>> - fix property encoding, e.g. AGE as int not string, don't store "null" !!
>>>   -> affects esp. aggregate query
>>> - don't re-run the benchmark on the same store, start at the initial one
>>>   -> creating and deleting the additional PROFILES_TEMP nodes affects 
>>> repeatability of results
>>> 
>>> correct datatypes:
>>> 
>>> * "null" should *never be stored*
>>> * int: public, gender, completion_percentage, AGE,
>>> * long/time: last_login, registration 
>>> * optionally as label: gender, public
>>> 
>>>   -> test repository (WIP): with changes in description.js and benchmark.js
>>> 
>>> https://github.com/jexp/nosql-tests/tree/node-neo4j 
>>> <https://github.com/jexp/nosql-tests/tree/node-neo4j>
>>> 
>>> queries for  for neo4j-shell:
>>> 
>>> export from="P/P1"
>>> export to="P/P277"
>>> 
>>> export key="P/P1"
>>> 
>>> // warmup
>>> MATCH ()--() return count(*);
>>> // 61.245.128 rows
>>> 
>>> MATCH (s:PROFILES) return count(*);
>>> // 1.632.803 profiles
>>> // 1.15 s
>>> 
>>> profile
>>> 
>>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) RETURN DISTINCT 
>>> n._key;
>>> // 295 rows 5 ms
>>> 
>>> 
>>> // 1st degree neighbours
>>> MATCH (:PROFILES {_key:{key}})-->(n) RETURN n._key;
>>> // 14 rows 1ms 
>>> 
>>> // 2nd degree neighbours
>>> MATCH (s:PROFILES {_key:{key}})-->(x)
>>> MATCH (x)-->(n:PROFILES)
>>> RETURN DISTINCT n._key;
>>> // 283 rows 6 ms
>>> 
>>> // shortest path
>>> MATCH (s:PROFILES {_key:{from}}),(t:PROFILES {_key:{to}}), 
>>> p = shortestPath((s)-[*..15]->(t)) RETURN [x in nodes(p) | x._key] as path;
>>> // 1 ms, don't return the full data only keys like in the other db's
>>> 
>>> // aggregation
>>> MATCH (f:PROFILES) RETURN f.AGE, count(*);
>>> // 22s -> should be rather 1.5s
>>> 
>>> // single read
>>> MATCH (f:PROFILES) WHERE f._key = {key} RETURN f;
>>> // or
>>> MATCH (s:PROFILES {_key:{key}}) RETURN s;
>>> // 1 row with 59 properties 1 ms
>>> 
>>> // single writes
>>> CREATE (s:PROFILES_TEMP {data}) RETURN id(s);
>>> 
>>> // delete all nodes with a certain label
>>> // loop until returns 0
>>> MATCH (n:PROFILES_TEMP) WITH n LIMIT 5000 OPTIONAL MATCH (n)-[r]-() DELETE 
>>> n,r RETURN count(*) as deleted
>>> ----
>>> 
>>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) WITH DISTINCT n._key 
>>> as key RETURN count(*);
>>> // 295 count 5-6ms
>>> 
>>> MATCH (f:PROFILES) return id(f) % 140, count(*);
>>> // 140 rows -> 1502 ms that's how it should be
>>> 
>>> sample data:
>>> 
>>> _key:"P/P1",
>>> public:"1",
>>> completion_percentage:"14",
>>> gender:"1",
>>> region:"zilinsky kraj, zilina",
>>> last_login:"2012-05-25 11:20:00.0",
>>> registration:"2005-04-03 00:00:00.0",
>>> AGE:26,
>>> body:"185 cm, 90 kg",
>>> I_am_working_in_field:"it",
>>> spoken_languages:"anglicky",
>>> hobbies:"sportovanie, spanie, kino, jedlo, pocuvanie hudby, priatelia, 
>>> divadlo",
>>> I_most_enjoy_good_food:"v dobrej restauracii",
>>> pets:"mam psa",
>>> body_type:"null",
>>> my_eyesight:"null",
>>> eye_color:"null",
>>> hair_color:"null",
>>> hair_type:"null",
>>> completed_level_of_education:"null",
>>> favourite_color:"null",
>>> relation_to_smoking:"null",
>>> relation_to_alcohol:"null",
>>> sign_in_zodiac:"null",
>>> on_pokec_i_am_looking_for:"null",
>>> love_is_for_me:"null",
>>> relation_to_casual_sex:"null",
>>> my_partner_should_be:"null",
>>> marital_status:"null",
>>> children:"null",
>>> relation_to_children:"null",
>>> I_like_movies:"null",
>>> I_like_watching_movie:"null",
>>> I_like_music:"null",
>>> I_mostly_like_listening_to_music:"null",
>>> the_idea_of_good_evening:"null",
>>> I_like_specialties_from_kitchen:"null",
>>> fun:"null",
>>> I_am_going_to_concerts:"null",
>>> my_active_sports:"null",
>>> my_passive_sports:"null",
>>> profession:"null",
>>> I_like_books:"null",
>>> life_style:"null",
>>> music:"null",
>>> cars:"null",
>>> politics:"null",
>>> relationships:"null",
>>> art_culture:"null",
>>> hobbies_interests:"null",
>>> science_technologies:"null",
>>> computers_internet:"null",
>>> education:"null",
>>> sport:"null",
>>> movies:"null",
>>> travelling:"null",
>>> health:"null",
>>> companies_brands:"null",
>>> more:"null"
>>> 
>>> 
>>> neo4j-server.properties:
>>> org.neo4j.server.database.location=/Users/mh/support/arangodb/db/data
>>> org.neo4j.server.webserver.port=8474
>>> dbms.security.auth_enabled=false
>>> 
>>> 
>>> neo4j-wrapper.conf:
>>> wrapper.java.initmemory=8000
>>> wrapper.java.maxmemory=8000
>>> wrapper.java.additional=-Xmn2G
>>> 
>>> neo4j.properties:
>>> dbms.pagecache.memory=5G
>>> keep_logical_logs=false
>>> remote_shell_enabled=false
>>> cache_type=soft
>>> online_backup_enabled=false
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to neo4j+un...@ <>googlegroups.com <http://googlegroups.com/>.
>>> For more options, visit https://groups.google.com/d/optout 
>>> <https://groups.google.com/d/optout>.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] 
>> <mailto:[email protected]>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] ArangoDB vs. Neo4j -- what's up? article of Jun 04, 2015

Reply via email to