you forgot --id-type integer the script actually takes care of it
> Am 11.06.2015 um 14:55 schrieb Michael Hunger > <[email protected]>: > > I used both 2.2.2 and 2.3-M02 and 2.3-SNAPSHOT for the import. > > I can also provide you with the freshly imported databases. Let me know. > > Michael > >> Am 11.06.2015 um 14:45 schrieb Frank Celler <[email protected] >> <mailto:[email protected]>>: >> >> Hi Michael, >> >> thanks a lot for the import script. I'm currently trying to generate a new >> database dump (with Neo4J 2.2.2 Community). But I get the following error: >> >> $ bash -x ./import.sh >> ... >> + rm -rf pokec.db >> + ./bin/neo4j-import --into pokec.db --id-type --delimiter TAB --quote Ö >> --nodes:PROFILES >> profiles_header.txt,soc-pokec-profiles_no_null_sorted.txt.gz >> --relationships:RELATION >> relationships_header.txt,soc-pokec-relationships.txt.gz >> Exception in thread "main" java.lang.NullPointerException >> at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:575) >> at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:571) >> at org.neo4j.helpers.Args.interpretOption(Args.java:490) >> at org.neo4j.tooling.ImportTool.main(ImportTool.java:282) >> at org.neo4j.tooling.ImportTool.main(ImportTool.java:244) >> >> My java is >> >> java version "1.8.0_45" >> Java(TM) SE Runtime Environment (build 1.8.0_45-b14) >> Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) >> >> Do I need 2.3 for the import? >> >> Thanks >> Frank >> >> Am Donnerstag, 11. Juni 2015 13:40:55 UTC+2 schrieb Michael Hunger: >> I created an import script which I added to my repository. >> On my machine it imports the data in 35 seconds. >> >> Which uses more sensible types for the fields and also skips all the null >> values. >> >> It also uses a numeric id for the primary key which makes more sense to me. >> >> If you optimized the dataset for Neo4j you could even use the node-id as >> primary-key as the input data has a sane, incrementing id then it would be >> way faster. >> >> I also added a neo4j-pokec directory with queries to use that numeric id as >> input (probably should also use a input.json file that doesn't contains >> "Pxxx" strings, not sure what the perf impact is of converting those >> strings). >> >> Cheers, Michael >> >> https://github.com/jexp/nosql-tests/tree/my-import >> <https://github.com/jexp/nosql-tests/tree/my-import> >> >> I did some preliminary testing >> >> Neo4j 2.2 >> >> node benchmark.js neo4j-pokec -t >> shortest,neighbors,neighbors2,aggregation,singleRead >> INFO using server address 127.0.0.1 >> INFO start >> INFO executing shortest path for 19 paths >> INFO total paths length: 104 >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: shortest path, 19 items >> INFO Total Time for 19 requests: 85 ms >> INFO Average: 4.47 ms >> INFO >> ----------------------------------------------------------------------------- >> INFO executing neighbors for 500 elements >> INFO total number of neighbors found: 9102 >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: neighbors, 500 items >> INFO Total Time for 500 requests: 428 ms >> INFO Average: 0.86 ms >> INFO >> ----------------------------------------------------------------------------- >> INFO executing neighbors 2nd degree for 500 elements >> INFO total number of neighbors2 found: 545530 >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: neighbors2, 500 items >> INFO Total Time for 500 requests: 4850 ms >> INFO Average: 9.7 ms >> INFO >> ----------------------------------------------------------------------------- >> INFO executing aggregation >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: aggregate, 1 items >> INFO Total Time for 1 requests: 14036 ms >> INFO Average: 14036 ms >> INFO >> ----------------------------------------------------------------------------- >> INFO executing single read with 100000 documents >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: single reads, 100000 items >> INFO Total Time for 100000 requests: 83473 ms >> INFO Average: 0.83 ms >> INFO >> ----------------------------------------------------------------------------- >> >> >> Neo4j 2.3 >> >> node benchmark.js neo4j-pokec -t >> shortest,neighbors,neighbors2,aggregation,singleRead >> INFO using server address 127.0.0.1 >> INFO start >> INFO executing shortest path for 19 paths >> INFO total paths length: 104 >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: shortest path, 19 items >> INFO Total Time for 19 requests: 69 ms >> INFO Average: 3.63 ms >> INFO >> ----------------------------------------------------------------------------- >> INFO executing neighbors for 500 elements >> INFO total number of neighbors found: 9102 >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: neighbors, 500 items >> INFO Total Time for 500 requests: 431 ms >> INFO Average: 0.86 ms >> INFO >> ----------------------------------------------------------------------------- >> INFO executing neighbors 2nd degree for 500 elements >> INFO total number of neighbors2 found: 545530 >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: neighbors2, 500 items >> INFO Total Time for 500 requests: 3441 ms >> INFO Average: 6.88 ms >> INFO >> ----------------------------------------------------------------------------- >> INFO executing aggregation >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: aggregate, 1 items >> INFO Total Time for 1 requests: 2848 ms >> INFO Average: 2848 ms >> INFO >> ----------------------------------------------------------------------------- >> INFO executing single read with 100000 documents >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: single reads, 100000 items >> INFO Total Time for 100000 requests: 77760 ms >> INFO Average: 0.78 ms >> INFO >> ----------------------------------------------------------------------------- >> DONE >> >> >>> Am 10.06.2015 um 18:55 schrieb Frank Celler <fce...@ <>gmail.com >>> <http://gmail.com/>>: >>> >>> Hi Michael, >>> >>> thanks for sharing your preliminary findings. I'll incorporate them into >>> the benchmark suite and rerun the tests. I've seen that there is a 30day >>> trial for the enterprise edition. So I can tests that as well. >>> >>> Is it possible to upload the database where you changed the AGE attribute? >>> Or is there any easy cypher command to change the type? >>> >>> Thanks >>> Frank >>> >>> >>> Am Mittwoch, 10. Juni 2015 17:27:05 UTC+2 schrieb Michael Hunger: >>> I also did some experiments but didn't have the time to finish yet, here >>> are my observations so far: >>> >>> Arangodb Measurement >>> >>> - index -> constraint `CREATE CONSTRAINT ON (p:PROFILES) ASSERT p._key IS >>> UNIQUE;` >>> - seraph -> replace with node-neo4j 2.0.RC1 >>> - uses 2 year old /cypher api, doesn't send X-Stream:true header >>> - does not do efficient auth (encode creds on every call) >>> - doesn't do pooling >>> - suboptimal queries >>> - make sure the concurrency level is adequate for the setup (utilize all >>> cores but don't flood, use e.g. async.eachWithLimit) >>> - warmup with nodes and rels `MATCH ()--() return count(*);` >>> - enterprise with better vertical read/write scalability vs. community >>> - Use 12G-24G heap, 2G new gen (-Xmn2G) >>> - pagecache to 2.5G + growth (e.g. another 2.5G) >>> - in 2.2 set cache_type = soft or cache_type=none depending on available >>> heap >>> - fix property encoding, e.g. AGE as int not string, don't store "null" !! >>> -> affects esp. aggregate query >>> - don't re-run the benchmark on the same store, start at the initial one >>> -> creating and deleting the additional PROFILES_TEMP nodes affects >>> repeatability of results >>> >>> correct datatypes: >>> >>> * "null" should *never be stored* >>> * int: public, gender, completion_percentage, AGE, >>> * long/time: last_login, registration >>> * optionally as label: gender, public >>> >>> -> test repository (WIP): with changes in description.js and benchmark.js >>> >>> https://github.com/jexp/nosql-tests/tree/node-neo4j >>> <https://github.com/jexp/nosql-tests/tree/node-neo4j> >>> >>> queries for for neo4j-shell: >>> >>> export from="P/P1" >>> export to="P/P277" >>> >>> export key="P/P1" >>> >>> // warmup >>> MATCH ()--() return count(*); >>> // 61.245.128 rows >>> >>> MATCH (s:PROFILES) return count(*); >>> // 1.632.803 profiles >>> // 1.15 s >>> >>> profile >>> >>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) RETURN DISTINCT >>> n._key; >>> // 295 rows 5 ms >>> >>> >>> // 1st degree neighbours >>> MATCH (:PROFILES {_key:{key}})-->(n) RETURN n._key; >>> // 14 rows 1ms >>> >>> // 2nd degree neighbours >>> MATCH (s:PROFILES {_key:{key}})-->(x) >>> MATCH (x)-->(n:PROFILES) >>> RETURN DISTINCT n._key; >>> // 283 rows 6 ms >>> >>> // shortest path >>> MATCH (s:PROFILES {_key:{from}}),(t:PROFILES {_key:{to}}), >>> p = shortestPath((s)-[*..15]->(t)) RETURN [x in nodes(p) | x._key] as path; >>> // 1 ms, don't return the full data only keys like in the other db's >>> >>> // aggregation >>> MATCH (f:PROFILES) RETURN f.AGE, count(*); >>> // 22s -> should be rather 1.5s >>> >>> // single read >>> MATCH (f:PROFILES) WHERE f._key = {key} RETURN f; >>> // or >>> MATCH (s:PROFILES {_key:{key}}) RETURN s; >>> // 1 row with 59 properties 1 ms >>> >>> // single writes >>> CREATE (s:PROFILES_TEMP {data}) RETURN id(s); >>> >>> // delete all nodes with a certain label >>> // loop until returns 0 >>> MATCH (n:PROFILES_TEMP) WITH n LIMIT 5000 OPTIONAL MATCH (n)-[r]-() DELETE >>> n,r RETURN count(*) as deleted >>> ---- >>> >>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) WITH DISTINCT n._key >>> as key RETURN count(*); >>> // 295 count 5-6ms >>> >>> MATCH (f:PROFILES) return id(f) % 140, count(*); >>> // 140 rows -> 1502 ms that's how it should be >>> >>> sample data: >>> >>> _key:"P/P1", >>> public:"1", >>> completion_percentage:"14", >>> gender:"1", >>> region:"zilinsky kraj, zilina", >>> last_login:"2012-05-25 11:20:00.0", >>> registration:"2005-04-03 00:00:00.0", >>> AGE:26, >>> body:"185 cm, 90 kg", >>> I_am_working_in_field:"it", >>> spoken_languages:"anglicky", >>> hobbies:"sportovanie, spanie, kino, jedlo, pocuvanie hudby, priatelia, >>> divadlo", >>> I_most_enjoy_good_food:"v dobrej restauracii", >>> pets:"mam psa", >>> body_type:"null", >>> my_eyesight:"null", >>> eye_color:"null", >>> hair_color:"null", >>> hair_type:"null", >>> completed_level_of_education:"null", >>> favourite_color:"null", >>> relation_to_smoking:"null", >>> relation_to_alcohol:"null", >>> sign_in_zodiac:"null", >>> on_pokec_i_am_looking_for:"null", >>> love_is_for_me:"null", >>> relation_to_casual_sex:"null", >>> my_partner_should_be:"null", >>> marital_status:"null", >>> children:"null", >>> relation_to_children:"null", >>> I_like_movies:"null", >>> I_like_watching_movie:"null", >>> I_like_music:"null", >>> I_mostly_like_listening_to_music:"null", >>> the_idea_of_good_evening:"null", >>> I_like_specialties_from_kitchen:"null", >>> fun:"null", >>> I_am_going_to_concerts:"null", >>> my_active_sports:"null", >>> my_passive_sports:"null", >>> profession:"null", >>> I_like_books:"null", >>> life_style:"null", >>> music:"null", >>> cars:"null", >>> politics:"null", >>> relationships:"null", >>> art_culture:"null", >>> hobbies_interests:"null", >>> science_technologies:"null", >>> computers_internet:"null", >>> education:"null", >>> sport:"null", >>> movies:"null", >>> travelling:"null", >>> health:"null", >>> companies_brands:"null", >>> more:"null" >>> >>> >>> neo4j-server.properties: >>> org.neo4j.server.database.location=/Users/mh/support/arangodb/db/data >>> org.neo4j.server.webserver.port=8474 >>> dbms.security.auth_enabled=false >>> >>> >>> neo4j-wrapper.conf: >>> wrapper.java.initmemory=8000 >>> wrapper.java.maxmemory=8000 >>> wrapper.java.additional=-Xmn2G >>> >>> neo4j.properties: >>> dbms.pagecache.memory=5G >>> keep_logical_logs=false >>> remote_shell_enabled=false >>> cache_type=soft >>> online_backup_enabled=false >>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to neo4j+un...@ <>googlegroups.com <http://googlegroups.com/>. >>> For more options, visit https://groups.google.com/d/optout >>> <https://groups.google.com/d/optout>. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] >> <mailto:[email protected]>. >> For more options, visit https://groups.google.com/d/optout >> <https://groups.google.com/d/optout>. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
