I created an import script which I added to my repository.
On my machine it imports the data in 35 seconds.

Which uses more sensible types for the fields and also skips all the null 
values.

It also uses a numeric id for the primary key which makes more sense to me.

If you optimized the dataset for Neo4j you could even use the node-id as 
primary-key as the input data has a sane, incrementing id then it would be way 
faster.

I also added a neo4j-pokec directory with queries to use that numeric id as 
input (probably should also use a input.json file that doesn't contains "Pxxx" 
strings, not sure what the perf impact is of converting those strings).

Cheers, Michael

https://github.com/jexp/nosql-tests/tree/my-import 
<https://github.com/jexp/nosql-tests/tree/my-import>

I did some preliminary testing

Neo4j 2.2

node benchmark.js neo4j-pokec -t 
shortest,neighbors,neighbors2,aggregation,singleRead 
INFO using server address  127.0.0.1
INFO start
INFO executing shortest path for 19 paths
INFO total paths length: 104
INFO 
-----------------------------------------------------------------------------
INFO Neo4J: shortest path, 19 items
INFO Total Time for 19 requests: 85 ms
INFO Average: 4.47 ms
INFO 
-----------------------------------------------------------------------------
INFO executing neighbors for 500 elements
INFO total number of neighbors found: 9102
INFO 
-----------------------------------------------------------------------------
INFO Neo4J: neighbors, 500 items
INFO Total Time for 500 requests: 428 ms
INFO Average: 0.86 ms
INFO 
-----------------------------------------------------------------------------
INFO executing neighbors 2nd degree for 500 elements
INFO total number of neighbors2 found: 545530
INFO 
-----------------------------------------------------------------------------
INFO Neo4J: neighbors2, 500 items
INFO Total Time for 500 requests: 4850 ms
INFO Average: 9.7 ms
INFO 
-----------------------------------------------------------------------------
INFO executing aggregation
INFO 
-----------------------------------------------------------------------------
INFO Neo4J: aggregate, 1 items
INFO Total Time for 1 requests: 14036 ms
INFO Average: 14036 ms
INFO 
-----------------------------------------------------------------------------
INFO executing single read with 100000 documents
INFO 
-----------------------------------------------------------------------------
INFO Neo4J: single reads, 100000 items
INFO Total Time for 100000 requests: 83473 ms
INFO Average: 0.83 ms
INFO 
-----------------------------------------------------------------------------


Neo4j 2.3

 node benchmark.js neo4j-pokec -t 
shortest,neighbors,neighbors2,aggregation,singleRead 
INFO using server address  127.0.0.1
INFO start
INFO executing shortest path for 19 paths
INFO total paths length: 104
INFO 
-----------------------------------------------------------------------------
INFO Neo4J: shortest path, 19 items
INFO Total Time for 19 requests: 69 ms
INFO Average: 3.63 ms
INFO 
-----------------------------------------------------------------------------
INFO executing neighbors for 500 elements
INFO total number of neighbors found: 9102
INFO 
-----------------------------------------------------------------------------
INFO Neo4J: neighbors, 500 items
INFO Total Time for 500 requests: 431 ms
INFO Average: 0.86 ms
INFO 
-----------------------------------------------------------------------------
INFO executing neighbors 2nd degree for 500 elements
INFO total number of neighbors2 found: 545530
INFO 
-----------------------------------------------------------------------------
INFO Neo4J: neighbors2, 500 items
INFO Total Time for 500 requests: 3441 ms
INFO Average: 6.88 ms
INFO 
-----------------------------------------------------------------------------
INFO executing aggregation
INFO 
-----------------------------------------------------------------------------
INFO Neo4J: aggregate, 1 items
INFO Total Time for 1 requests: 2848 ms
INFO Average: 2848 ms
INFO 
-----------------------------------------------------------------------------
INFO executing single read with 100000 documents
INFO 
-----------------------------------------------------------------------------
INFO Neo4J: single reads, 100000 items
INFO Total Time for 100000 requests: 77760 ms
INFO Average: 0.78 ms
INFO 
-----------------------------------------------------------------------------
DONE


> Am 10.06.2015 um 18:55 schrieb Frank Celler <[email protected]>:
> 
> Hi Michael,
> 
> thanks for sharing your preliminary findings. I'll incorporate them into the 
> benchmark suite and rerun the tests. I've seen that there is a 30day trial 
> for the enterprise edition. So I can tests that as well.
> 
> Is it possible to upload the database where you changed the AGE attribute? Or 
> is there any easy cypher command to change the type?
> 
> Thanks
>   Frank
> 
> 
> Am Mittwoch, 10. Juni 2015 17:27:05 UTC+2 schrieb Michael Hunger:
> I also did some experiments but didn't have the time to finish yet, here are 
> my observations so far:
> 
> Arangodb Measurement
> 
> - index -> constraint `CREATE CONSTRAINT ON (p:PROFILES) ASSERT p._key IS 
> UNIQUE;`
> - seraph -> replace with node-neo4j 2.0.RC1 
>   - uses 2 year old /cypher api, doesn't send X-Stream:true header
>   - does not do efficient auth (encode creds on every call)
>   - doesn't do pooling
> - suboptimal queries
> - make sure the concurrency level is adequate for the setup (utilize all 
> cores but don't flood, use e.g. async.eachWithLimit)
> - warmup with nodes and rels `MATCH ()--() return count(*);`
> - enterprise with better vertical read/write scalability vs. community
> - Use 12G-24G heap, 2G new gen (-Xmn2G)
> - pagecache to 2.5G + growth (e.g. another 2.5G)
> - in 2.2 set cache_type = soft or cache_type=none depending on available heap
> - fix property encoding, e.g. AGE as int not string, don't store "null" !!
>   -> affects esp. aggregate query
> - don't re-run the benchmark on the same store, start at the initial one
>   -> creating and deleting the additional PROFILES_TEMP nodes affects 
> repeatability of results
> 
> correct datatypes:
> 
> * "null" should *never be stored*
> * int: public, gender, completion_percentage, AGE,
> * long/time: last_login, registration 
> * optionally as label: gender, public
> 
>   -> test repository (WIP): with changes in description.js and benchmark.js
> 
> https://github.com/jexp/nosql-tests/tree/node-neo4j 
> <https://github.com/jexp/nosql-tests/tree/node-neo4j>
> 
> queries for  for neo4j-shell:
> 
> export from="P/P1"
> export to="P/P277"
> 
> export key="P/P1"
> 
> // warmup
> MATCH ()--() return count(*);
> // 61.245.128 rows
> 
> MATCH (s:PROFILES) return count(*);
> // 1.632.803 profiles
> // 1.15 s
> 
> profile
> 
> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) RETURN DISTINCT n._key;
> // 295 rows 5 ms
> 
> 
> // 1st degree neighbours
> MATCH (:PROFILES {_key:{key}})-->(n) RETURN n._key;
> // 14 rows 1ms 
> 
> // 2nd degree neighbours
> MATCH (s:PROFILES {_key:{key}})-->(x)
> MATCH (x)-->(n:PROFILES)
> RETURN DISTINCT n._key;
> // 283 rows 6 ms
> 
> // shortest path
> MATCH (s:PROFILES {_key:{from}}),(t:PROFILES {_key:{to}}), 
> p = shortestPath((s)-[*..15]->(t)) RETURN [x in nodes(p) | x._key] as path;
> // 1 ms, don't return the full data only keys like in the other db's
> 
> // aggregation
> MATCH (f:PROFILES) RETURN f.AGE, count(*);
> // 22s -> should be rather 1.5s
> 
> // single read
> MATCH (f:PROFILES) WHERE f._key = {key} RETURN f;
> // or
> MATCH (s:PROFILES {_key:{key}}) RETURN s;
> // 1 row with 59 properties 1 ms
> 
> // single writes
> CREATE (s:PROFILES_TEMP {data}) RETURN id(s);
> 
> // delete all nodes with a certain label
> // loop until returns 0
> MATCH (n:PROFILES_TEMP) WITH n LIMIT 5000 OPTIONAL MATCH (n)-[r]-() DELETE 
> n,r RETURN count(*) as deleted
> ----
> 
> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) WITH DISTINCT n._key as 
> key RETURN count(*);
> // 295 count 5-6ms
> 
> MATCH (f:PROFILES) return id(f) % 140, count(*);
> // 140 rows -> 1502 ms that's how it should be
> 
> sample data:
> 
> _key:"P/P1",
> public:"1",
> completion_percentage:"14",
> gender:"1",
> region:"zilinsky kraj, zilina",
> last_login:"2012-05-25 11:20:00.0",
> registration:"2005-04-03 00:00:00.0",
> AGE:26,
> body:"185 cm, 90 kg",
> I_am_working_in_field:"it",
> spoken_languages:"anglicky",
> hobbies:"sportovanie, spanie, kino, jedlo, pocuvanie hudby, priatelia, 
> divadlo",
> I_most_enjoy_good_food:"v dobrej restauracii",
> pets:"mam psa",
> body_type:"null",
> my_eyesight:"null",
> eye_color:"null",
> hair_color:"null",
> hair_type:"null",
> completed_level_of_education:"null",
> favourite_color:"null",
> relation_to_smoking:"null",
> relation_to_alcohol:"null",
> sign_in_zodiac:"null",
> on_pokec_i_am_looking_for:"null",
> love_is_for_me:"null",
> relation_to_casual_sex:"null",
> my_partner_should_be:"null",
> marital_status:"null",
> children:"null",
> relation_to_children:"null",
> I_like_movies:"null",
> I_like_watching_movie:"null",
> I_like_music:"null",
> I_mostly_like_listening_to_music:"null",
> the_idea_of_good_evening:"null",
> I_like_specialties_from_kitchen:"null",
> fun:"null",
> I_am_going_to_concerts:"null",
> my_active_sports:"null",
> my_passive_sports:"null",
> profession:"null",
> I_like_books:"null",
> life_style:"null",
> music:"null",
> cars:"null",
> politics:"null",
> relationships:"null",
> art_culture:"null",
> hobbies_interests:"null",
> science_technologies:"null",
> computers_internet:"null",
> education:"null",
> sport:"null",
> movies:"null",
> travelling:"null",
> health:"null",
> companies_brands:"null",
> more:"null"
> 
> 
> neo4j-server.properties:
> org.neo4j.server.database.location=/Users/mh/support/arangodb/db/data
> org.neo4j.server.webserver.port=8474
> dbms.security.auth_enabled=false
> 
> 
> neo4j-wrapper.conf:
> wrapper.java.initmemory=8000
> wrapper.java.maxmemory=8000
> wrapper.java.additional=-Xmn2G
> 
> neo4j.properties:
> dbms.pagecache.memory=5G
> keep_logical_logs=false
> remote_shell_enabled=false
> cache_type=soft
> online_backup_enabled=false
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to