Hi Michael,
thanks a lot for the import script. I'm currently trying to generate a new
database dump (with Neo4J 2.2.2 Community). But I get the following error:
$ bash -x ./import.sh
...
+ rm -rf pokec.db
+ ./bin/neo4j-import --into pokec.db --id-type --delimiter TAB --quote Ö
--nodes:PROFILES
profiles_header.txt,soc-pokec-profiles_no_null_sorted.txt.gz
--relationships:RELATION
relationships_header.txt,soc-pokec-relationships.txt.gz
Exception in thread "main" java.lang.NullPointerException
at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:575)
at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:571)
at org.neo4j.helpers.Args.interpretOption(Args.java:490)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:282)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:244)
My java is
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
Do I need 2.3 for the import?
Thanks
Frank
Am Donnerstag, 11. Juni 2015 13:40:55 UTC+2 schrieb Michael Hunger:
>
> I created an import script which I added to my repository.
> On my machine it imports the data in 35 seconds.
>
> Which uses more sensible types for the fields and also skips all the null
> values.
>
> It also uses a numeric id for the primary key which makes more sense to me.
>
> If you optimized the dataset for Neo4j you could even use the node-id as
> primary-key as the input data has a sane, incrementing id then it would be
> way faster.
>
> I also added a neo4j-pokec directory with queries to use that numeric id
> as input (probably should also use a input.json file that doesn't contains
> "Pxxx" strings, not sure what the perf impact is of converting those
> strings).
>
> Cheers, Michael
>
> https://github.com/jexp/nosql-tests/tree/my-import
>
> I did some preliminary testing
>
> Neo4j 2.2
>
> node benchmark.js neo4j-pokec -t
> shortest,neighbors,neighbors2,aggregation,singleRead
> INFO using server address 127.0.0.1
> INFO start
> INFO executing shortest path for 19 paths
> INFO total paths length: 104
> INFO
> -----------------------------------------------------------------------------
> INFO Neo4J: *shortest* path, 19 items
> INFO Total Time for 19 requests: 85 ms
> INFO Average: *4.47 ms*
> INFO
> -----------------------------------------------------------------------------
> INFO executing neighbors for 500 elements
> INFO total number of neighbors found: 9102
> INFO
> -----------------------------------------------------------------------------
> INFO Neo4J: *neighbors*, 500 items
> INFO Total Time for 500 requests: 428 ms
> INFO Average: *0.86 ms*
> INFO
> -----------------------------------------------------------------------------
> INFO executing neighbors 2nd degree for 500 elements
> INFO total number of neighbors2 found: 545530
> INFO
> -----------------------------------------------------------------------------
> INFO Neo4J: *neighbors2*, 500 items
> INFO Total Time for 500 requests: 4850 ms
> INFO Average: *9.7 ms*
> INFO
> -----------------------------------------------------------------------------
> INFO executing aggregation
> INFO
> -----------------------------------------------------------------------------
> INFO Neo4J: *aggregate*, 1 items
> INFO Total Time for 1 requests: 14036 ms
> INFO Average: *14036 ms*
> INFO
> -----------------------------------------------------------------------------
> INFO executing single read with 100000 documents
> INFO
> -----------------------------------------------------------------------------
> INFO Neo4J: *single reads*, 100000 items
> INFO Total Time for 100000 requests: 83473 ms
> INFO Average: *0.83 ms*
> INFO
> -----------------------------------------------------------------------------
>
>
> Neo4j 2.3
>
> node benchmark.js neo4j-pokec -t
> shortest,neighbors,neighbors2,aggregation,singleRead
> INFO using server address 127.0.0.1
> INFO start
> INFO executing shortest path for 19 paths
> INFO total paths length: 104
> INFO
> -----------------------------------------------------------------------------
> INFO Neo4J: *shortest* path, 19 items
> INFO Total Time for 19 requests: 69 ms
> INFO Average: *3.63 ms*
> INFO
> -----------------------------------------------------------------------------
> INFO executing neighbors for 500 elements
> INFO total number of neighbors found: 9102
> INFO
> -----------------------------------------------------------------------------
> INFO Neo4J: *neighbors*, 500 items
> INFO Total Time for 500 requests: 431 ms
> INFO Average: *0.86 ms*
> INFO
> -----------------------------------------------------------------------------
> INFO executing neighbors 2nd degree for 500 elements
> INFO total number of neighbors2 found: 545530
> INFO
> -----------------------------------------------------------------------------
> INFO Neo4J: *neighbors2*, 500 items
> INFO Total Time for 500 requests: 3441 ms
> INFO Average: *6.88 ms*
> INFO
> -----------------------------------------------------------------------------
> INFO executing aggregation
> INFO
> -----------------------------------------------------------------------------
> INFO Neo4J: *aggregate*, 1 items
> INFO Total Time for 1 requests: 2848 ms
> INFO Average: *2848 ms*
> INFO
> -----------------------------------------------------------------------------
> INFO executing single read with 100000 documents
> INFO
> -----------------------------------------------------------------------------
> INFO Neo4J: *single reads*, 100000 items
> INFO Total Time for 100000 requests: 77760 ms
> INFO Average: *0.78 ms*
> INFO
> -----------------------------------------------------------------------------
> DONE
>
>
> Am 10.06.2015 um 18:55 schrieb Frank Celler <[email protected]
> <javascript:>>:
>
> Hi Michael,
>
> thanks for sharing your preliminary findings. I'll incorporate them into
> the benchmark suite and rerun the tests. I've seen that there is a 30day
> trial for the enterprise edition. So I can tests that as well.
>
> Is it possible to upload the database where you changed the AGE attribute?
> Or is there any easy cypher command to change the type?
>
> Thanks
> Frank
>
>
> Am Mittwoch, 10. Juni 2015 17:27:05 UTC+2 schrieb Michael Hunger:
>>
>> I also did some experiments but didn't have the time to finish yet, here
>> are my observations so far:
>>
>> *Arangodb Measurement*
>>
>> - index -> constraint `CREATE CONSTRAINT ON (p:PROFILES) ASSERT p._key IS
>> UNIQUE;`
>> - seraph -> replace with node-neo4j 2.0.RC1
>> - uses 2 year old /cypher api, doesn't send X-Stream:true header
>> - does not do efficient auth (encode creds on every call)
>> - doesn't do pooling
>> - suboptimal queries
>> - make sure the concurrency level is adequate for the setup (utilize all
>> cores but don't flood, use e.g. async.eachWithLimit)
>> - warmup with nodes and rels `MATCH ()--() return count(*);`
>> - enterprise with better vertical read/write scalability vs. community
>> - Use 12G-24G heap, 2G new gen (-Xmn2G)
>> - pagecache to 2.5G + growth (e.g. another 2.5G)
>> - in 2.2 set cache_type = soft or cache_type=none depending on available
>> heap
>> - fix property encoding, e.g. AGE as int not string, don't store "null" !!
>> -> affects esp. aggregate query
>> - don't re-run the benchmark on the same store, start at the initial one
>> -> creating and deleting the additional PROFILES_TEMP nodes affects
>> repeatability of results
>>
>> correct datatypes:
>>
>> * "null" should *never be stored*
>> * int: public, gender, completion_percentage, AGE,
>> * long/time: last_login, registration
>> * optionally as label: gender, public
>>
>> -> test repository (WIP): with changes in *description.js and
>> benchmark.js*
>>
>> https://github.com/jexp/nosql-tests/tree/node-neo4j
>>
>> queries for for neo4j-shell:
>>
>> export from="P/P1"
>> export to="P/P277"
>>
>> export key="P/P1"
>>
>> // warmup
>> MATCH ()--() return count(*);
>> // 61.245.128 rows
>>
>> MATCH (s:PROFILES) return count(*);
>> // 1.632.803 profiles
>> // 1.15 s
>>
>> profile
>>
>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) RETURN DISTINCT
>> n._key;
>> // 295 rows 5 ms
>>
>>
>> // 1st degree neighbours
>> MATCH (:PROFILES {_key:{key}})-->(n) RETURN n._key;
>> // 14 rows 1ms
>>
>> // 2nd degree neighbours
>> MATCH (s:PROFILES {_key:{key}})-->(x)
>> MATCH (x)-->(n:PROFILES)
>> RETURN DISTINCT n._key;
>> // 283 rows 6 ms
>>
>> // shortest path
>> MATCH (s:PROFILES {_key:{from}}),(t:PROFILES {_key:{to}}),
>> p = shortestPath((s)-[*..15]->(t)) RETURN [x in nodes(p) | x._key] as
>> path;
>> // 1 ms, don't return the full data only keys like in the other db's
>>
>> // aggregation
>> MATCH (f:PROFILES) RETURN f.AGE, count(*);
>> // 22s -> should be rather 1.5s
>>
>> // single read
>> MATCH (f:PROFILES) WHERE f._key = {key} RETURN f;
>> // or
>> MATCH (s:PROFILES {_key:{key}}) RETURN s;
>> // 1 row with 59 properties 1 ms
>>
>> // single writes
>> CREATE (s:PROFILES_TEMP {data}) RETURN id(s);
>>
>> // delete all nodes with a certain label
>> // loop until returns 0
>> MATCH (n:PROFILES_TEMP) WITH n LIMIT 5000 OPTIONAL MATCH (n)-[r]-()
>> DELETE n,r RETURN count(*) as deleted
>> ----
>>
>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) WITH DISTINCT
>> n._key as key RETURN count(*);
>> // 295 count 5-6ms
>>
>> MATCH (f:PROFILES) return id(f) % 140, count(*);
>> // 140 rows -> 1502 ms that's how it should be
>>
>> sample data:
>>
>> _key:"P/P1",
>> public:"1",
>> completion_percentage:"14",
>> gender:"1",
>> region:"zilinsky kraj, zilina",
>> last_login:"2012-05-25 11:20:00.0",
>> registration:"2005-04-03 00:00:00.0",
>> AGE:26,
>> body:"185 cm, 90 kg",
>> I_am_working_in_field:"it",
>> spoken_languages:"anglicky",
>> hobbies:"sportovanie, spanie, kino, jedlo, pocuvanie hudby, priatelia,
>> divadlo",
>> I_most_enjoy_good_food:"v dobrej restauracii",
>> pets:"mam psa",
>> body_type:"null",
>> my_eyesight:"null",
>> eye_color:"null",
>> hair_color:"null",
>> hair_type:"null",
>> completed_level_of_education:"null",
>> favourite_color:"null",
>> relation_to_smoking:"null",
>> relation_to_alcohol:"null",
>> sign_in_zodiac:"null",
>> on_pokec_i_am_looking_for:"null",
>> love_is_for_me:"null",
>> relation_to_casual_sex:"null",
>> my_partner_should_be:"null",
>> marital_status:"null",
>> children:"null",
>> relation_to_children:"null",
>> I_like_movies:"null",
>> I_like_watching_movie:"null",
>> I_like_music:"null",
>> I_mostly_like_listening_to_music:"null",
>> the_idea_of_good_evening:"null",
>> I_like_specialties_from_kitchen:"null",
>> fun:"null",
>> I_am_going_to_concerts:"null",
>> my_active_sports:"null",
>> my_passive_sports:"null",
>> profession:"null",
>> I_like_books:"null",
>> life_style:"null",
>> music:"null",
>> cars:"null",
>> politics:"null",
>> relationships:"null",
>> art_culture:"null",
>> hobbies_interests:"null",
>> science_technologies:"null",
>> computers_internet:"null",
>> education:"null",
>> sport:"null",
>> movies:"null",
>> travelling:"null",
>> health:"null",
>> companies_brands:"null",
>> more:"null"
>>
>>
>> neo4j-server.properties:
>> org.neo4j.server.database.location=/Users/mh/support/arangodb/db/data
>> org.neo4j.server.webserver.port=8474
>> dbms.security.auth_enabled=false
>>
>>
>> neo4j-wrapper.conf:
>> wrapper.java.initmemory=8000
>> wrapper.java.maxmemory=8000
>> wrapper.java.additional=-Xmn2G
>>
>> neo4j.properties:
>> dbms.pagecache.memory=5G
>> keep_logical_logs=false
>> remote_shell_enabled=false
>> cache_type=soft
>> online_backup_enabled=false
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.