I created an import script which I added to my repository. On my machine it imports the data in 35 seconds.
Which uses more sensible types for the fields and also skips all the null values. It also uses a numeric id for the primary key which makes more sense to me. If you optimized the dataset for Neo4j you could even use the node-id as primary-key as the input data has a sane, incrementing id then it would be way faster. I also added a neo4j-pokec directory with queries to use that numeric id as input (probably should also use a input.json file that doesn't contains "Pxxx" strings, not sure what the perf impact is of converting those strings). Cheers, Michael https://github.com/jexp/nosql-tests/tree/my-import <https://github.com/jexp/nosql-tests/tree/my-import> I did some preliminary testing Neo4j 2.2 node benchmark.js neo4j-pokec -t shortest,neighbors,neighbors2,aggregation,singleRead INFO using server address 127.0.0.1 INFO start INFO executing shortest path for 19 paths INFO total paths length: 104 INFO ----------------------------------------------------------------------------- INFO Neo4J: shortest path, 19 items INFO Total Time for 19 requests: 85 ms INFO Average: 4.47 ms INFO ----------------------------------------------------------------------------- INFO executing neighbors for 500 elements INFO total number of neighbors found: 9102 INFO ----------------------------------------------------------------------------- INFO Neo4J: neighbors, 500 items INFO Total Time for 500 requests: 428 ms INFO Average: 0.86 ms INFO ----------------------------------------------------------------------------- INFO executing neighbors 2nd degree for 500 elements INFO total number of neighbors2 found: 545530 INFO ----------------------------------------------------------------------------- INFO Neo4J: neighbors2, 500 items INFO Total Time for 500 requests: 4850 ms INFO Average: 9.7 ms INFO ----------------------------------------------------------------------------- INFO executing aggregation INFO ----------------------------------------------------------------------------- INFO Neo4J: aggregate, 1 items INFO Total Time for 1 requests: 14036 ms INFO Average: 14036 ms INFO ----------------------------------------------------------------------------- INFO executing single read with 100000 documents INFO ----------------------------------------------------------------------------- INFO Neo4J: single reads, 100000 items INFO Total Time for 100000 requests: 83473 ms INFO Average: 0.83 ms INFO ----------------------------------------------------------------------------- Neo4j 2.3 node benchmark.js neo4j-pokec -t shortest,neighbors,neighbors2,aggregation,singleRead INFO using server address 127.0.0.1 INFO start INFO executing shortest path for 19 paths INFO total paths length: 104 INFO ----------------------------------------------------------------------------- INFO Neo4J: shortest path, 19 items INFO Total Time for 19 requests: 69 ms INFO Average: 3.63 ms INFO ----------------------------------------------------------------------------- INFO executing neighbors for 500 elements INFO total number of neighbors found: 9102 INFO ----------------------------------------------------------------------------- INFO Neo4J: neighbors, 500 items INFO Total Time for 500 requests: 431 ms INFO Average: 0.86 ms INFO ----------------------------------------------------------------------------- INFO executing neighbors 2nd degree for 500 elements INFO total number of neighbors2 found: 545530 INFO ----------------------------------------------------------------------------- INFO Neo4J: neighbors2, 500 items INFO Total Time for 500 requests: 3441 ms INFO Average: 6.88 ms INFO ----------------------------------------------------------------------------- INFO executing aggregation INFO ----------------------------------------------------------------------------- INFO Neo4J: aggregate, 1 items INFO Total Time for 1 requests: 2848 ms INFO Average: 2848 ms INFO ----------------------------------------------------------------------------- INFO executing single read with 100000 documents INFO ----------------------------------------------------------------------------- INFO Neo4J: single reads, 100000 items INFO Total Time for 100000 requests: 77760 ms INFO Average: 0.78 ms INFO ----------------------------------------------------------------------------- DONE > Am 10.06.2015 um 18:55 schrieb Frank Celler <[email protected]>: > > Hi Michael, > > thanks for sharing your preliminary findings. I'll incorporate them into the > benchmark suite and rerun the tests. I've seen that there is a 30day trial > for the enterprise edition. So I can tests that as well. > > Is it possible to upload the database where you changed the AGE attribute? Or > is there any easy cypher command to change the type? > > Thanks > Frank > > > Am Mittwoch, 10. Juni 2015 17:27:05 UTC+2 schrieb Michael Hunger: > I also did some experiments but didn't have the time to finish yet, here are > my observations so far: > > Arangodb Measurement > > - index -> constraint `CREATE CONSTRAINT ON (p:PROFILES) ASSERT p._key IS > UNIQUE;` > - seraph -> replace with node-neo4j 2.0.RC1 > - uses 2 year old /cypher api, doesn't send X-Stream:true header > - does not do efficient auth (encode creds on every call) > - doesn't do pooling > - suboptimal queries > - make sure the concurrency level is adequate for the setup (utilize all > cores but don't flood, use e.g. async.eachWithLimit) > - warmup with nodes and rels `MATCH ()--() return count(*);` > - enterprise with better vertical read/write scalability vs. community > - Use 12G-24G heap, 2G new gen (-Xmn2G) > - pagecache to 2.5G + growth (e.g. another 2.5G) > - in 2.2 set cache_type = soft or cache_type=none depending on available heap > - fix property encoding, e.g. AGE as int not string, don't store "null" !! > -> affects esp. aggregate query > - don't re-run the benchmark on the same store, start at the initial one > -> creating and deleting the additional PROFILES_TEMP nodes affects > repeatability of results > > correct datatypes: > > * "null" should *never be stored* > * int: public, gender, completion_percentage, AGE, > * long/time: last_login, registration > * optionally as label: gender, public > > -> test repository (WIP): with changes in description.js and benchmark.js > > https://github.com/jexp/nosql-tests/tree/node-neo4j > <https://github.com/jexp/nosql-tests/tree/node-neo4j> > > queries for for neo4j-shell: > > export from="P/P1" > export to="P/P277" > > export key="P/P1" > > // warmup > MATCH ()--() return count(*); > // 61.245.128 rows > > MATCH (s:PROFILES) return count(*); > // 1.632.803 profiles > // 1.15 s > > profile > > MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) RETURN DISTINCT n._key; > // 295 rows 5 ms > > > // 1st degree neighbours > MATCH (:PROFILES {_key:{key}})-->(n) RETURN n._key; > // 14 rows 1ms > > // 2nd degree neighbours > MATCH (s:PROFILES {_key:{key}})-->(x) > MATCH (x)-->(n:PROFILES) > RETURN DISTINCT n._key; > // 283 rows 6 ms > > // shortest path > MATCH (s:PROFILES {_key:{from}}),(t:PROFILES {_key:{to}}), > p = shortestPath((s)-[*..15]->(t)) RETURN [x in nodes(p) | x._key] as path; > // 1 ms, don't return the full data only keys like in the other db's > > // aggregation > MATCH (f:PROFILES) RETURN f.AGE, count(*); > // 22s -> should be rather 1.5s > > // single read > MATCH (f:PROFILES) WHERE f._key = {key} RETURN f; > // or > MATCH (s:PROFILES {_key:{key}}) RETURN s; > // 1 row with 59 properties 1 ms > > // single writes > CREATE (s:PROFILES_TEMP {data}) RETURN id(s); > > // delete all nodes with a certain label > // loop until returns 0 > MATCH (n:PROFILES_TEMP) WITH n LIMIT 5000 OPTIONAL MATCH (n)-[r]-() DELETE > n,r RETURN count(*) as deleted > ---- > > MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) WITH DISTINCT n._key as > key RETURN count(*); > // 295 count 5-6ms > > MATCH (f:PROFILES) return id(f) % 140, count(*); > // 140 rows -> 1502 ms that's how it should be > > sample data: > > _key:"P/P1", > public:"1", > completion_percentage:"14", > gender:"1", > region:"zilinsky kraj, zilina", > last_login:"2012-05-25 11:20:00.0", > registration:"2005-04-03 00:00:00.0", > AGE:26, > body:"185 cm, 90 kg", > I_am_working_in_field:"it", > spoken_languages:"anglicky", > hobbies:"sportovanie, spanie, kino, jedlo, pocuvanie hudby, priatelia, > divadlo", > I_most_enjoy_good_food:"v dobrej restauracii", > pets:"mam psa", > body_type:"null", > my_eyesight:"null", > eye_color:"null", > hair_color:"null", > hair_type:"null", > completed_level_of_education:"null", > favourite_color:"null", > relation_to_smoking:"null", > relation_to_alcohol:"null", > sign_in_zodiac:"null", > on_pokec_i_am_looking_for:"null", > love_is_for_me:"null", > relation_to_casual_sex:"null", > my_partner_should_be:"null", > marital_status:"null", > children:"null", > relation_to_children:"null", > I_like_movies:"null", > I_like_watching_movie:"null", > I_like_music:"null", > I_mostly_like_listening_to_music:"null", > the_idea_of_good_evening:"null", > I_like_specialties_from_kitchen:"null", > fun:"null", > I_am_going_to_concerts:"null", > my_active_sports:"null", > my_passive_sports:"null", > profession:"null", > I_like_books:"null", > life_style:"null", > music:"null", > cars:"null", > politics:"null", > relationships:"null", > art_culture:"null", > hobbies_interests:"null", > science_technologies:"null", > computers_internet:"null", > education:"null", > sport:"null", > movies:"null", > travelling:"null", > health:"null", > companies_brands:"null", > more:"null" > > > neo4j-server.properties: > org.neo4j.server.database.location=/Users/mh/support/arangodb/db/data > org.neo4j.server.webserver.port=8474 > dbms.security.auth_enabled=false > > > neo4j-wrapper.conf: > wrapper.java.initmemory=8000 > wrapper.java.maxmemory=8000 > wrapper.java.additional=-Xmn2G > > neo4j.properties: > dbms.pagecache.memory=5G > keep_logical_logs=false > remote_shell_enabled=false > cache_type=soft > online_backup_enabled=false > > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
