Re: [Neo4j] ArangoDB vs. Neo4j -- what's up? article of Jun 04, 2015

Michael Hunger Thu, 11 Jun 2015 05:56:07 -0700

I used both 2.2.2 and 2.3-M02 and 2.3-SNAPSHOT for the import.

I can also provide you with the freshly imported databases. Let me know.


Michael

> Am 11.06.2015 um 14:45 schrieb Frank Celler <[email protected]>:
> 
> Hi Michael,
> 
> thanks a lot for the import script. I'm currently trying to generate a new 
> database dump (with Neo4J 2.2.2 Community). But I get the following error:
> 
> $ bash -x ./import.sh 
> ...
> + rm -rf pokec.db
> + ./bin/neo4j-import --into pokec.db --id-type --delimiter TAB --quote Ö 
> --nodes:PROFILES profiles_header.txt,soc-pokec-profiles_no_null_sorted.txt.gz 
> --relationships:RELATION 
> relationships_header.txt,soc-pokec-relationships.txt.gz
> Exception in thread "main" java.lang.NullPointerException
>     at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:575)
>     at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:571)
>     at org.neo4j.helpers.Args.interpretOption(Args.java:490)
>     at org.neo4j.tooling.ImportTool.main(ImportTool.java:282)
>     at org.neo4j.tooling.ImportTool.main(ImportTool.java:244)
> 
> My java is
> 
> java version "1.8.0_45"
> Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
> 
> Do I need 2.3 for the import?
> 
> Thanks
>   Frank
> 
> Am Donnerstag, 11. Juni 2015 13:40:55 UTC+2 schrieb Michael Hunger:
> I created an import script which I added to my repository.
> On my machine it imports the data in 35 seconds.
> 
> Which uses more sensible types for the fields and also skips all the null 
> values.
> 
> It also uses a numeric id for the primary key which makes more sense to me.
> 
> If you optimized the dataset for Neo4j you could even use the node-id as 
> primary-key as the input data has a sane, incrementing id then it would be 
> way faster.
> 
> I also added a neo4j-pokec directory with queries to use that numeric id as 
> input (probably should also use a input.json file that doesn't contains 
> "Pxxx" strings, not sure what the perf impact is of converting those strings).
> 
> Cheers, Michael
> 
> https://github.com/jexp/nosql-tests/tree/my-import 
> <https://github.com/jexp/nosql-tests/tree/my-import>
> 
> I did some preliminary testing
> 
> Neo4j 2.2
> 
> node benchmark.js neo4j-pokec -t 
> shortest,neighbors,neighbors2,aggregation,singleRead 
> INFO using server address  127.0.0.1
> INFO start
> INFO executing shortest path for 19 paths
> INFO total paths length: 104
> INFO 
> -----------------------------------------------------------------------------
> INFO Neo4J: shortest path, 19 items
> INFO Total Time for 19 requests: 85 ms
> INFO Average: 4.47 ms
> INFO 
> -----------------------------------------------------------------------------
> INFO executing neighbors for 500 elements
> INFO total number of neighbors found: 9102
> INFO 
> -----------------------------------------------------------------------------
> INFO Neo4J: neighbors, 500 items
> INFO Total Time for 500 requests: 428 ms
> INFO Average: 0.86 ms
> INFO 
> -----------------------------------------------------------------------------
> INFO executing neighbors 2nd degree for 500 elements
> INFO total number of neighbors2 found: 545530
> INFO 
> -----------------------------------------------------------------------------
> INFO Neo4J: neighbors2, 500 items
> INFO Total Time for 500 requests: 4850 ms
> INFO Average: 9.7 ms
> INFO 
> -----------------------------------------------------------------------------
> INFO executing aggregation
> INFO 
> -----------------------------------------------------------------------------
> INFO Neo4J: aggregate, 1 items
> INFO Total Time for 1 requests: 14036 ms
> INFO Average: 14036 ms
> INFO 
> -----------------------------------------------------------------------------
> INFO executing single read with 100000 documents
> INFO 
> -----------------------------------------------------------------------------
> INFO Neo4J: single reads, 100000 items
> INFO Total Time for 100000 requests: 83473 ms
> INFO Average: 0.83 ms
> INFO 
> -----------------------------------------------------------------------------
> 
> 
> Neo4j 2.3
> 
>  node benchmark.js neo4j-pokec -t 
> shortest,neighbors,neighbors2,aggregation,singleRead 
> INFO using server address  127.0.0.1
> INFO start
> INFO executing shortest path for 19 paths
> INFO total paths length: 104
> INFO 
> -----------------------------------------------------------------------------
> INFO Neo4J: shortest path, 19 items
> INFO Total Time for 19 requests: 69 ms
> INFO Average: 3.63 ms
> INFO 
> -----------------------------------------------------------------------------
> INFO executing neighbors for 500 elements
> INFO total number of neighbors found: 9102
> INFO 
> -----------------------------------------------------------------------------
> INFO Neo4J: neighbors, 500 items
> INFO Total Time for 500 requests: 431 ms
> INFO Average: 0.86 ms
> INFO 
> -----------------------------------------------------------------------------
> INFO executing neighbors 2nd degree for 500 elements
> INFO total number of neighbors2 found: 545530
> INFO 
> -----------------------------------------------------------------------------
> INFO Neo4J: neighbors2, 500 items
> INFO Total Time for 500 requests: 3441 ms
> INFO Average: 6.88 ms
> INFO 
> -----------------------------------------------------------------------------
> INFO executing aggregation
> INFO 
> -----------------------------------------------------------------------------
> INFO Neo4J: aggregate, 1 items
> INFO Total Time for 1 requests: 2848 ms
> INFO Average: 2848 ms
> INFO 
> -----------------------------------------------------------------------------
> INFO executing single read with 100000 documents
> INFO 
> -----------------------------------------------------------------------------
> INFO Neo4J: single reads, 100000 items
> INFO Total Time for 100000 requests: 77760 ms
> INFO Average: 0.78 ms
> INFO 
> -----------------------------------------------------------------------------
> DONE
> 
> 
>> Am 10.06.2015 um 18:55 schrieb Frank Celler <fce...@ <>gmail.com 
>> <http://gmail.com/>>:
>> 
>> Hi Michael,
>> 
>> thanks for sharing your preliminary findings. I'll incorporate them into the 
>> benchmark suite and rerun the tests. I've seen that there is a 30day trial 
>> for the enterprise edition. So I can tests that as well.
>> 
>> Is it possible to upload the database where you changed the AGE attribute? 
>> Or is there any easy cypher command to change the type?
>> 
>> Thanks
>>   Frank
>> 
>> 
>> Am Mittwoch, 10. Juni 2015 17:27:05 UTC+2 schrieb Michael Hunger:
>> I also did some experiments but didn't have the time to finish yet, here are 
>> my observations so far:
>> 
>> Arangodb Measurement
>> 
>> - index -> constraint `CREATE CONSTRAINT ON (p:PROFILES) ASSERT p._key IS 
>> UNIQUE;`
>> - seraph -> replace with node-neo4j 2.0.RC1 
>>   - uses 2 year old /cypher api, doesn't send X-Stream:true header
>>   - does not do efficient auth (encode creds on every call)
>>   - doesn't do pooling
>> - suboptimal queries
>> - make sure the concurrency level is adequate for the setup (utilize all 
>> cores but don't flood, use e.g. async.eachWithLimit)
>> - warmup with nodes and rels `MATCH ()--() return count(*);`
>> - enterprise with better vertical read/write scalability vs. community
>> - Use 12G-24G heap, 2G new gen (-Xmn2G)
>> - pagecache to 2.5G + growth (e.g. another 2.5G)
>> - in 2.2 set cache_type = soft or cache_type=none depending on available heap
>> - fix property encoding, e.g. AGE as int not string, don't store "null" !!
>>   -> affects esp. aggregate query
>> - don't re-run the benchmark on the same store, start at the initial one
>>   -> creating and deleting the additional PROFILES_TEMP nodes affects 
>> repeatability of results
>> 
>> correct datatypes:
>> 
>> * "null" should *never be stored*
>> * int: public, gender, completion_percentage, AGE,
>> * long/time: last_login, registration 
>> * optionally as label: gender, public
>> 
>>   -> test repository (WIP): with changes in description.js and benchmark.js
>> 
>> https://github.com/jexp/nosql-tests/tree/node-neo4j 
>> <https://github.com/jexp/nosql-tests/tree/node-neo4j>
>> 
>> queries for  for neo4j-shell:
>> 
>> export from="P/P1"
>> export to="P/P277"
>> 
>> export key="P/P1"
>> 
>> // warmup
>> MATCH ()--() return count(*);
>> // 61.245.128 rows
>> 
>> MATCH (s:PROFILES) return count(*);
>> // 1.632.803 profiles
>> // 1.15 s
>> 
>> profile
>> 
>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) RETURN DISTINCT n._key;
>> // 295 rows 5 ms
>> 
>> 
>> // 1st degree neighbours
>> MATCH (:PROFILES {_key:{key}})-->(n) RETURN n._key;
>> // 14 rows 1ms 
>> 
>> // 2nd degree neighbours
>> MATCH (s:PROFILES {_key:{key}})-->(x)
>> MATCH (x)-->(n:PROFILES)
>> RETURN DISTINCT n._key;
>> // 283 rows 6 ms
>> 
>> // shortest path
>> MATCH (s:PROFILES {_key:{from}}),(t:PROFILES {_key:{to}}), 
>> p = shortestPath((s)-[*..15]->(t)) RETURN [x in nodes(p) | x._key] as path;
>> // 1 ms, don't return the full data only keys like in the other db's
>> 
>> // aggregation
>> MATCH (f:PROFILES) RETURN f.AGE, count(*);
>> // 22s -> should be rather 1.5s
>> 
>> // single read
>> MATCH (f:PROFILES) WHERE f._key = {key} RETURN f;
>> // or
>> MATCH (s:PROFILES {_key:{key}}) RETURN s;
>> // 1 row with 59 properties 1 ms
>> 
>> // single writes
>> CREATE (s:PROFILES_TEMP {data}) RETURN id(s);
>> 
>> // delete all nodes with a certain label
>> // loop until returns 0
>> MATCH (n:PROFILES_TEMP) WITH n LIMIT 5000 OPTIONAL MATCH (n)-[r]-() DELETE 
>> n,r RETURN count(*) as deleted
>> ----
>> 
>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) WITH DISTINCT n._key 
>> as key RETURN count(*);
>> // 295 count 5-6ms
>> 
>> MATCH (f:PROFILES) return id(f) % 140, count(*);
>> // 140 rows -> 1502 ms that's how it should be
>> 
>> sample data:
>> 
>> _key:"P/P1",
>> public:"1",
>> completion_percentage:"14",
>> gender:"1",
>> region:"zilinsky kraj, zilina",
>> last_login:"2012-05-25 11:20:00.0",
>> registration:"2005-04-03 00:00:00.0",
>> AGE:26,
>> body:"185 cm, 90 kg",
>> I_am_working_in_field:"it",
>> spoken_languages:"anglicky",
>> hobbies:"sportovanie, spanie, kino, jedlo, pocuvanie hudby, priatelia, 
>> divadlo",
>> I_most_enjoy_good_food:"v dobrej restauracii",
>> pets:"mam psa",
>> body_type:"null",
>> my_eyesight:"null",
>> eye_color:"null",
>> hair_color:"null",
>> hair_type:"null",
>> completed_level_of_education:"null",
>> favourite_color:"null",
>> relation_to_smoking:"null",
>> relation_to_alcohol:"null",
>> sign_in_zodiac:"null",
>> on_pokec_i_am_looking_for:"null",
>> love_is_for_me:"null",
>> relation_to_casual_sex:"null",
>> my_partner_should_be:"null",
>> marital_status:"null",
>> children:"null",
>> relation_to_children:"null",
>> I_like_movies:"null",
>> I_like_watching_movie:"null",
>> I_like_music:"null",
>> I_mostly_like_listening_to_music:"null",
>> the_idea_of_good_evening:"null",
>> I_like_specialties_from_kitchen:"null",
>> fun:"null",
>> I_am_going_to_concerts:"null",
>> my_active_sports:"null",
>> my_passive_sports:"null",
>> profession:"null",
>> I_like_books:"null",
>> life_style:"null",
>> music:"null",
>> cars:"null",
>> politics:"null",
>> relationships:"null",
>> art_culture:"null",
>> hobbies_interests:"null",
>> science_technologies:"null",
>> computers_internet:"null",
>> education:"null",
>> sport:"null",
>> movies:"null",
>> travelling:"null",
>> health:"null",
>> companies_brands:"null",
>> more:"null"
>> 
>> 
>> neo4j-server.properties:
>> org.neo4j.server.database.location=/Users/mh/support/arangodb/db/data
>> org.neo4j.server.webserver.port=8474
>> dbms.security.auth_enabled=false
>> 
>> 
>> neo4j-wrapper.conf:
>> wrapper.java.initmemory=8000
>> wrapper.java.maxmemory=8000
>> wrapper.java.additional=-Xmn2G
>> 
>> neo4j.properties:
>> dbms.pagecache.memory=5G
>> keep_logical_logs=false
>> remote_shell_enabled=false
>> cache_type=soft
>> online_backup_enabled=false
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@ <>googlegroups.com <http://googlegroups.com/>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] ArangoDB vs. Neo4j -- what's up? article of Jun 04, 2015

Reply via email to