I used both 2.2.2 and 2.3-M02 and 2.3-SNAPSHOT for the import. I can also provide you with the freshly imported databases. Let me know.
Michael > Am 11.06.2015 um 14:45 schrieb Frank Celler <[email protected]>: > > Hi Michael, > > thanks a lot for the import script. I'm currently trying to generate a new > database dump (with Neo4J 2.2.2 Community). But I get the following error: > > $ bash -x ./import.sh > ... > + rm -rf pokec.db > + ./bin/neo4j-import --into pokec.db --id-type --delimiter TAB --quote Ö > --nodes:PROFILES profiles_header.txt,soc-pokec-profiles_no_null_sorted.txt.gz > --relationships:RELATION > relationships_header.txt,soc-pokec-relationships.txt.gz > Exception in thread "main" java.lang.NullPointerException > at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:575) > at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:571) > at org.neo4j.helpers.Args.interpretOption(Args.java:490) > at org.neo4j.tooling.ImportTool.main(ImportTool.java:282) > at org.neo4j.tooling.ImportTool.main(ImportTool.java:244) > > My java is > > java version "1.8.0_45" > Java(TM) SE Runtime Environment (build 1.8.0_45-b14) > Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) > > Do I need 2.3 for the import? > > Thanks > Frank > > Am Donnerstag, 11. Juni 2015 13:40:55 UTC+2 schrieb Michael Hunger: > I created an import script which I added to my repository. > On my machine it imports the data in 35 seconds. > > Which uses more sensible types for the fields and also skips all the null > values. > > It also uses a numeric id for the primary key which makes more sense to me. > > If you optimized the dataset for Neo4j you could even use the node-id as > primary-key as the input data has a sane, incrementing id then it would be > way faster. > > I also added a neo4j-pokec directory with queries to use that numeric id as > input (probably should also use a input.json file that doesn't contains > "Pxxx" strings, not sure what the perf impact is of converting those strings). > > Cheers, Michael > > https://github.com/jexp/nosql-tests/tree/my-import > <https://github.com/jexp/nosql-tests/tree/my-import> > > I did some preliminary testing > > Neo4j 2.2 > > node benchmark.js neo4j-pokec -t > shortest,neighbors,neighbors2,aggregation,singleRead > INFO using server address 127.0.0.1 > INFO start > INFO executing shortest path for 19 paths > INFO total paths length: 104 > INFO > ----------------------------------------------------------------------------- > INFO Neo4J: shortest path, 19 items > INFO Total Time for 19 requests: 85 ms > INFO Average: 4.47 ms > INFO > ----------------------------------------------------------------------------- > INFO executing neighbors for 500 elements > INFO total number of neighbors found: 9102 > INFO > ----------------------------------------------------------------------------- > INFO Neo4J: neighbors, 500 items > INFO Total Time for 500 requests: 428 ms > INFO Average: 0.86 ms > INFO > ----------------------------------------------------------------------------- > INFO executing neighbors 2nd degree for 500 elements > INFO total number of neighbors2 found: 545530 > INFO > ----------------------------------------------------------------------------- > INFO Neo4J: neighbors2, 500 items > INFO Total Time for 500 requests: 4850 ms > INFO Average: 9.7 ms > INFO > ----------------------------------------------------------------------------- > INFO executing aggregation > INFO > ----------------------------------------------------------------------------- > INFO Neo4J: aggregate, 1 items > INFO Total Time for 1 requests: 14036 ms > INFO Average: 14036 ms > INFO > ----------------------------------------------------------------------------- > INFO executing single read with 100000 documents > INFO > ----------------------------------------------------------------------------- > INFO Neo4J: single reads, 100000 items > INFO Total Time for 100000 requests: 83473 ms > INFO Average: 0.83 ms > INFO > ----------------------------------------------------------------------------- > > > Neo4j 2.3 > > node benchmark.js neo4j-pokec -t > shortest,neighbors,neighbors2,aggregation,singleRead > INFO using server address 127.0.0.1 > INFO start > INFO executing shortest path for 19 paths > INFO total paths length: 104 > INFO > ----------------------------------------------------------------------------- > INFO Neo4J: shortest path, 19 items > INFO Total Time for 19 requests: 69 ms > INFO Average: 3.63 ms > INFO > ----------------------------------------------------------------------------- > INFO executing neighbors for 500 elements > INFO total number of neighbors found: 9102 > INFO > ----------------------------------------------------------------------------- > INFO Neo4J: neighbors, 500 items > INFO Total Time for 500 requests: 431 ms > INFO Average: 0.86 ms > INFO > ----------------------------------------------------------------------------- > INFO executing neighbors 2nd degree for 500 elements > INFO total number of neighbors2 found: 545530 > INFO > ----------------------------------------------------------------------------- > INFO Neo4J: neighbors2, 500 items > INFO Total Time for 500 requests: 3441 ms > INFO Average: 6.88 ms > INFO > ----------------------------------------------------------------------------- > INFO executing aggregation > INFO > ----------------------------------------------------------------------------- > INFO Neo4J: aggregate, 1 items > INFO Total Time for 1 requests: 2848 ms > INFO Average: 2848 ms > INFO > ----------------------------------------------------------------------------- > INFO executing single read with 100000 documents > INFO > ----------------------------------------------------------------------------- > INFO Neo4J: single reads, 100000 items > INFO Total Time for 100000 requests: 77760 ms > INFO Average: 0.78 ms > INFO > ----------------------------------------------------------------------------- > DONE > > >> Am 10.06.2015 um 18:55 schrieb Frank Celler <fce...@ <>gmail.com >> <http://gmail.com/>>: >> >> Hi Michael, >> >> thanks for sharing your preliminary findings. I'll incorporate them into the >> benchmark suite and rerun the tests. I've seen that there is a 30day trial >> for the enterprise edition. So I can tests that as well. >> >> Is it possible to upload the database where you changed the AGE attribute? >> Or is there any easy cypher command to change the type? >> >> Thanks >> Frank >> >> >> Am Mittwoch, 10. Juni 2015 17:27:05 UTC+2 schrieb Michael Hunger: >> I also did some experiments but didn't have the time to finish yet, here are >> my observations so far: >> >> Arangodb Measurement >> >> - index -> constraint `CREATE CONSTRAINT ON (p:PROFILES) ASSERT p._key IS >> UNIQUE;` >> - seraph -> replace with node-neo4j 2.0.RC1 >> - uses 2 year old /cypher api, doesn't send X-Stream:true header >> - does not do efficient auth (encode creds on every call) >> - doesn't do pooling >> - suboptimal queries >> - make sure the concurrency level is adequate for the setup (utilize all >> cores but don't flood, use e.g. async.eachWithLimit) >> - warmup with nodes and rels `MATCH ()--() return count(*);` >> - enterprise with better vertical read/write scalability vs. community >> - Use 12G-24G heap, 2G new gen (-Xmn2G) >> - pagecache to 2.5G + growth (e.g. another 2.5G) >> - in 2.2 set cache_type = soft or cache_type=none depending on available heap >> - fix property encoding, e.g. AGE as int not string, don't store "null" !! >> -> affects esp. aggregate query >> - don't re-run the benchmark on the same store, start at the initial one >> -> creating and deleting the additional PROFILES_TEMP nodes affects >> repeatability of results >> >> correct datatypes: >> >> * "null" should *never be stored* >> * int: public, gender, completion_percentage, AGE, >> * long/time: last_login, registration >> * optionally as label: gender, public >> >> -> test repository (WIP): with changes in description.js and benchmark.js >> >> https://github.com/jexp/nosql-tests/tree/node-neo4j >> <https://github.com/jexp/nosql-tests/tree/node-neo4j> >> >> queries for for neo4j-shell: >> >> export from="P/P1" >> export to="P/P277" >> >> export key="P/P1" >> >> // warmup >> MATCH ()--() return count(*); >> // 61.245.128 rows >> >> MATCH (s:PROFILES) return count(*); >> // 1.632.803 profiles >> // 1.15 s >> >> profile >> >> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) RETURN DISTINCT n._key; >> // 295 rows 5 ms >> >> >> // 1st degree neighbours >> MATCH (:PROFILES {_key:{key}})-->(n) RETURN n._key; >> // 14 rows 1ms >> >> // 2nd degree neighbours >> MATCH (s:PROFILES {_key:{key}})-->(x) >> MATCH (x)-->(n:PROFILES) >> RETURN DISTINCT n._key; >> // 283 rows 6 ms >> >> // shortest path >> MATCH (s:PROFILES {_key:{from}}),(t:PROFILES {_key:{to}}), >> p = shortestPath((s)-[*..15]->(t)) RETURN [x in nodes(p) | x._key] as path; >> // 1 ms, don't return the full data only keys like in the other db's >> >> // aggregation >> MATCH (f:PROFILES) RETURN f.AGE, count(*); >> // 22s -> should be rather 1.5s >> >> // single read >> MATCH (f:PROFILES) WHERE f._key = {key} RETURN f; >> // or >> MATCH (s:PROFILES {_key:{key}}) RETURN s; >> // 1 row with 59 properties 1 ms >> >> // single writes >> CREATE (s:PROFILES_TEMP {data}) RETURN id(s); >> >> // delete all nodes with a certain label >> // loop until returns 0 >> MATCH (n:PROFILES_TEMP) WITH n LIMIT 5000 OPTIONAL MATCH (n)-[r]-() DELETE >> n,r RETURN count(*) as deleted >> ---- >> >> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) WITH DISTINCT n._key >> as key RETURN count(*); >> // 295 count 5-6ms >> >> MATCH (f:PROFILES) return id(f) % 140, count(*); >> // 140 rows -> 1502 ms that's how it should be >> >> sample data: >> >> _key:"P/P1", >> public:"1", >> completion_percentage:"14", >> gender:"1", >> region:"zilinsky kraj, zilina", >> last_login:"2012-05-25 11:20:00.0", >> registration:"2005-04-03 00:00:00.0", >> AGE:26, >> body:"185 cm, 90 kg", >> I_am_working_in_field:"it", >> spoken_languages:"anglicky", >> hobbies:"sportovanie, spanie, kino, jedlo, pocuvanie hudby, priatelia, >> divadlo", >> I_most_enjoy_good_food:"v dobrej restauracii", >> pets:"mam psa", >> body_type:"null", >> my_eyesight:"null", >> eye_color:"null", >> hair_color:"null", >> hair_type:"null", >> completed_level_of_education:"null", >> favourite_color:"null", >> relation_to_smoking:"null", >> relation_to_alcohol:"null", >> sign_in_zodiac:"null", >> on_pokec_i_am_looking_for:"null", >> love_is_for_me:"null", >> relation_to_casual_sex:"null", >> my_partner_should_be:"null", >> marital_status:"null", >> children:"null", >> relation_to_children:"null", >> I_like_movies:"null", >> I_like_watching_movie:"null", >> I_like_music:"null", >> I_mostly_like_listening_to_music:"null", >> the_idea_of_good_evening:"null", >> I_like_specialties_from_kitchen:"null", >> fun:"null", >> I_am_going_to_concerts:"null", >> my_active_sports:"null", >> my_passive_sports:"null", >> profession:"null", >> I_like_books:"null", >> life_style:"null", >> music:"null", >> cars:"null", >> politics:"null", >> relationships:"null", >> art_culture:"null", >> hobbies_interests:"null", >> science_technologies:"null", >> computers_internet:"null", >> education:"null", >> sport:"null", >> movies:"null", >> travelling:"null", >> health:"null", >> companies_brands:"null", >> more:"null" >> >> >> neo4j-server.properties: >> org.neo4j.server.database.location=/Users/mh/support/arangodb/db/data >> org.neo4j.server.webserver.port=8474 >> dbms.security.auth_enabled=false >> >> >> neo4j-wrapper.conf: >> wrapper.java.initmemory=8000 >> wrapper.java.maxmemory=8000 >> wrapper.java.additional=-Xmn2G >> >> neo4j.properties: >> dbms.pagecache.memory=5G >> keep_logical_logs=false >> remote_shell_enabled=false >> cache_type=soft >> online_backup_enabled=false >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to neo4j+un...@ <>googlegroups.com <http://googlegroups.com/>. >> For more options, visit https://groups.google.com/d/optout >> <https://groups.google.com/d/optout>. > > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
