It worked perfectly. Am Donnerstag, 11. Juni 2015 15:36:30 UTC+2 schrieb Michael Hunger: > > you forgot --id-type integer > > the script actually takes care of it > > Am 11.06.2015 um 14:55 schrieb Michael Hunger < > [email protected] <javascript:>>: > > I used both 2.2.2 and 2.3-M02 and 2.3-SNAPSHOT for the import. > > I can also provide you with the freshly imported databases. Let me know. > > Michael > > Am 11.06.2015 um 14:45 schrieb Frank Celler <[email protected] > <javascript:>>: > > Hi Michael, > > thanks a lot for the import script. I'm currently trying to generate a new > database dump (with Neo4J 2.2.2 Community). But I get the following error: > > $ bash -x ./import.sh > ... > + rm -rf pokec.db > + ./bin/neo4j-import --into pokec.db --id-type --delimiter TAB --quote Ö > --nodes:PROFILES > profiles_header.txt,soc-pokec-profiles_no_null_sorted.txt.gz > --relationships:RELATION > relationships_header.txt,soc-pokec-relationships.txt.gz > Exception in thread "main" java.lang.NullPointerException > at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:575) > at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:571) > at org.neo4j.helpers.Args.interpretOption(Args.java:490) > at org.neo4j.tooling.ImportTool.main(ImportTool.java:282) > at org.neo4j.tooling.ImportTool.main(ImportTool.java:244) > > My java is > > java version "1.8.0_45" > Java(TM) SE Runtime Environment (build 1.8.0_45-b14) > Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) > > Do I need 2.3 for the import? > > Thanks > Frank > > Am Donnerstag, 11. Juni 2015 13:40:55 UTC+2 schrieb Michael Hunger: >> >> I created an import script which I added to my repository. >> On my machine it imports the data in 35 seconds. >> >> Which uses more sensible types for the fields and also skips all the null >> values. >> >> It also uses a numeric id for the primary key which makes more sense to >> me. >> >> If you optimized the dataset for Neo4j you could even use the node-id as >> primary-key as the input data has a sane, incrementing id then it would be >> way faster. >> >> I also added a neo4j-pokec directory with queries to use that numeric id >> as input (probably should also use a input.json file that doesn't contains >> "Pxxx" strings, not sure what the perf impact is of converting those >> strings). >> >> Cheers, Michael >> >> https://github.com/jexp/nosql-tests/tree/my-import >> >> I did some preliminary testing >> >> Neo4j 2.2 >> >> node benchmark.js neo4j-pokec -t >> shortest,neighbors,neighbors2,aggregation,singleRead >> INFO using server address 127.0.0.1 >> INFO start >> INFO executing shortest path for 19 paths >> INFO total paths length: 104 >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: *shortest* path, 19 items >> INFO Total Time for 19 requests: 85 ms >> INFO Average: *4.47 ms* >> INFO >> ----------------------------------------------------------------------------- >> INFO executing neighbors for 500 elements >> INFO total number of neighbors found: 9102 >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: *neighbors*, 500 items >> INFO Total Time for 500 requests: 428 ms >> INFO Average: *0.86 ms* >> INFO >> ----------------------------------------------------------------------------- >> INFO executing neighbors 2nd degree for 500 elements >> INFO total number of neighbors2 found: 545530 >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: *neighbors2*, 500 items >> INFO Total Time for 500 requests: 4850 ms >> INFO Average: *9.7 ms* >> INFO >> ----------------------------------------------------------------------------- >> INFO executing aggregation >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: *aggregate*, 1 items >> INFO Total Time for 1 requests: 14036 ms >> INFO Average: *14036 ms* >> INFO >> ----------------------------------------------------------------------------- >> INFO executing single read with 100000 documents >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: *single reads*, 100000 items >> INFO Total Time for 100000 requests: 83473 ms >> INFO Average: *0.83 ms* >> INFO >> ----------------------------------------------------------------------------- >> >> >> Neo4j 2.3 >> >> node benchmark.js neo4j-pokec -t >> shortest,neighbors,neighbors2,aggregation,singleRead >> INFO using server address 127.0.0.1 >> INFO start >> INFO executing shortest path for 19 paths >> INFO total paths length: 104 >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: *shortest* path, 19 items >> INFO Total Time for 19 requests: 69 ms >> INFO Average: *3.63 ms* >> INFO >> ----------------------------------------------------------------------------- >> INFO executing neighbors for 500 elements >> INFO total number of neighbors found: 9102 >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: *neighbors*, 500 items >> INFO Total Time for 500 requests: 431 ms >> INFO Average: *0.86 ms* >> INFO >> ----------------------------------------------------------------------------- >> INFO executing neighbors 2nd degree for 500 elements >> INFO total number of neighbors2 found: 545530 >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: *neighbors2*, 500 items >> INFO Total Time for 500 requests: 3441 ms >> INFO Average: *6.88 ms* >> INFO >> ----------------------------------------------------------------------------- >> INFO executing aggregation >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: *aggregate*, 1 items >> INFO Total Time for 1 requests: 2848 ms >> INFO Average: *2848 ms* >> INFO >> ----------------------------------------------------------------------------- >> INFO executing single read with 100000 documents >> INFO >> ----------------------------------------------------------------------------- >> INFO Neo4J: *single reads*, 100000 items >> INFO Total Time for 100000 requests: 77760 ms >> INFO Average: *0.78 ms* >> INFO >> ----------------------------------------------------------------------------- >> DONE >> >> >> Am 10.06.2015 um 18:55 schrieb Frank Celler <[email protected]>: >> >> Hi Michael, >> >> thanks for sharing your preliminary findings. I'll incorporate them into >> the benchmark suite and rerun the tests. I've seen that there is a 30day >> trial for the enterprise edition. So I can tests that as well. >> >> Is it possible to upload the database where you changed the AGE >> attribute? Or is there any easy cypher command to change the type? >> >> Thanks >> Frank >> >> >> Am Mittwoch, 10. Juni 2015 17:27:05 UTC+2 schrieb Michael Hunger: >>> >>> I also did some experiments but didn't have the time to finish yet, here >>> are my observations so far: >>> >>> *Arangodb Measurement* >>> >>> - index -> constraint `CREATE CONSTRAINT ON (p:PROFILES) ASSERT p._key >>> IS UNIQUE;` >>> - seraph -> replace with node-neo4j 2.0.RC1 >>> - uses 2 year old /cypher api, doesn't send X-Stream:true header >>> - does not do efficient auth (encode creds on every call) >>> - doesn't do pooling >>> - suboptimal queries >>> - make sure the concurrency level is adequate for the setup (utilize all >>> cores but don't flood, use e.g. async.eachWithLimit) >>> - warmup with nodes and rels `MATCH ()--() return count(*);` >>> - enterprise with better vertical read/write scalability vs. community >>> - Use 12G-24G heap, 2G new gen (-Xmn2G) >>> - pagecache to 2.5G + growth (e.g. another 2.5G) >>> - in 2.2 set cache_type = soft or cache_type=none depending on available >>> heap >>> - fix property encoding, e.g. AGE as int not string, don't store "null" >>> !! >>> -> affects esp. aggregate query >>> - don't re-run the benchmark on the same store, start at the initial one >>> -> creating and deleting the additional PROFILES_TEMP nodes affects >>> repeatability of results >>> >>> correct datatypes: >>> >>> * "null" should *never be stored* >>> * int: public, gender, completion_percentage, AGE, >>> * long/time: last_login, registration >>> * optionally as label: gender, public >>> >>> -> test repository (WIP): with changes in *description.js and >>> benchmark.js* >>> >>> https://github.com/jexp/nosql-tests/tree/node-neo4j >>> >>> queries for for neo4j-shell: >>> >>> export from="P/P1" >>> export to="P/P277" >>> >>> export key="P/P1" >>> >>> // warmup >>> MATCH ()--() return count(*); >>> // 61.245.128 rows >>> >>> MATCH (s:PROFILES) return count(*); >>> // 1.632.803 profiles >>> // 1.15 s >>> >>> profile >>> >>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) RETURN DISTINCT >>> n._key; >>> // 295 rows 5 ms >>> >>> >>> // 1st degree neighbours >>> MATCH (:PROFILES {_key:{key}})-->(n) RETURN n._key; >>> // 14 rows 1ms >>> >>> // 2nd degree neighbours >>> MATCH (s:PROFILES {_key:{key}})-->(x) >>> MATCH (x)-->(n:PROFILES) >>> RETURN DISTINCT n._key; >>> // 283 rows 6 ms >>> >>> // shortest path >>> MATCH (s:PROFILES {_key:{from}}),(t:PROFILES {_key:{to}}), >>> p = shortestPath((s)-[*..15]->(t)) RETURN [x in nodes(p) | x._key] as >>> path; >>> // 1 ms, don't return the full data only keys like in the other db's >>> >>> // aggregation >>> MATCH (f:PROFILES) RETURN f.AGE, count(*); >>> // 22s -> should be rather 1.5s >>> >>> // single read >>> MATCH (f:PROFILES) WHERE f._key = {key} RETURN f; >>> // or >>> MATCH (s:PROFILES {_key:{key}}) RETURN s; >>> // 1 row with 59 properties 1 ms >>> >>> // single writes >>> CREATE (s:PROFILES_TEMP {data}) RETURN id(s); >>> >>> // delete all nodes with a certain label >>> // loop until returns 0 >>> MATCH (n:PROFILES_TEMP) WITH n LIMIT 5000 OPTIONAL MATCH (n)-[r]-() >>> DELETE n,r RETURN count(*) as deleted >>> ---- >>> >>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) WITH DISTINCT >>> n._key as key RETURN count(*); >>> // 295 count 5-6ms >>> >>> MATCH (f:PROFILES) return id(f) % 140, count(*); >>> // 140 rows -> 1502 ms that's how it should be >>> >>> sample data: >>> >>> _key:"P/P1", >>> public:"1", >>> completion_percentage:"14", >>> gender:"1", >>> region:"zilinsky kraj, zilina", >>> last_login:"2012-05-25 11:20:00.0", >>> registration:"2005-04-03 00:00:00.0", >>> AGE:26, >>> body:"185 cm, 90 kg", >>> I_am_working_in_field:"it", >>> spoken_languages:"anglicky", >>> hobbies:"sportovanie, spanie, kino, jedlo, pocuvanie hudby, priatelia, >>> divadlo", >>> I_most_enjoy_good_food:"v dobrej restauracii", >>> pets:"mam psa", >>> body_type:"null", >>> my_eyesight:"null", >>> eye_color:"null", >>> hair_color:"null", >>> hair_type:"null", >>> completed_level_of_education:"null", >>> favourite_color:"null", >>> relation_to_smoking:"null", >>> relation_to_alcohol:"null", >>> sign_in_zodiac:"null", >>> on_pokec_i_am_looking_for:"null", >>> love_is_for_me:"null", >>> relation_to_casual_sex:"null", >>> my_partner_should_be:"null", >>> marital_status:"null", >>> children:"null", >>> relation_to_children:"null", >>> I_like_movies:"null", >>> I_like_watching_movie:"null", >>> I_like_music:"null", >>> I_mostly_like_listening_to_music:"null", >>> the_idea_of_good_evening:"null", >>> I_like_specialties_from_kitchen:"null", >>> fun:"null", >>> I_am_going_to_concerts:"null", >>> my_active_sports:"null", >>> my_passive_sports:"null", >>> profession:"null", >>> I_like_books:"null", >>> life_style:"null", >>> music:"null", >>> cars:"null", >>> politics:"null", >>> relationships:"null", >>> art_culture:"null", >>> hobbies_interests:"null", >>> science_technologies:"null", >>> computers_internet:"null", >>> education:"null", >>> sport:"null", >>> movies:"null", >>> travelling:"null", >>> health:"null", >>> companies_brands:"null", >>> more:"null" >>> >>> >>> neo4j-server.properties: >>> org.neo4j.server.database.location=/Users/mh/support/arangodb/db/data >>> org.neo4j.server.webserver.port=8474 >>> dbms.security.auth_enabled=false >>> >>> >>> neo4j-wrapper.conf: >>> wrapper.java.initmemory=8000 >>> wrapper.java.maxmemory=8000 >>> wrapper.java.additional=-Xmn2G >>> >>> neo4j.properties: >>> dbms.pagecache.memory=5G >>> keep_logical_logs=false >>> remote_shell_enabled=false >>> cache_type=soft >>> online_backup_enabled=false >>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> >> >> > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > For more options, visit https://groups.google.com/d/optout. > > > >
-- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
