I've updated the database dump on Amazon S3 following Michael's suggestion. I 
will rerun the tests as soon as Michael has finished his investigation.


Best,

  Frank


Am Donnerstag, 11. Juni 2015 17:01:20 UTC+2 schrieb Frank Celler:
>
> It worked perfectly. 
>
> Am Donnerstag, 11. Juni 2015 15:36:30 UTC+2 schrieb Michael Hunger:
>>
>> you forgot --id-type integer
>>
>> the script actually takes care of it
>>
>> Am 11.06.2015 um 14:55 schrieb Michael Hunger <
>> [email protected]>:
>>
>> I used both 2.2.2 and 2.3-M02 and 2.3-SNAPSHOT for the import.
>>
>> I can also provide you with the freshly imported databases. Let me know.
>>
>> Michael
>>
>> Am 11.06.2015 um 14:45 schrieb Frank Celler <[email protected]>:
>>
>> Hi Michael,
>>
>> thanks a lot for the import script. I'm currently trying to generate a 
>> new database dump (with Neo4J 2.2.2 Community). But I get the following 
>> error:
>>
>> $ bash -x ./import.sh 
>> ...
>> + rm -rf pokec.db
>> + ./bin/neo4j-import --into pokec.db --id-type --delimiter TAB --quote Ö 
>> --nodes:PROFILES 
>> profiles_header.txt,soc-pokec-profiles_no_null_sorted.txt.gz 
>> --relationships:RELATION 
>> relationships_header.txt,soc-pokec-relationships.txt.gz
>> Exception in thread "main" java.lang.NullPointerException
>>     at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:575)
>>     at org.neo4j.tooling.ImportTool$6.apply(ImportTool.java:571)
>>     at org.neo4j.helpers.Args.interpretOption(Args.java:490)
>>     at org.neo4j.tooling.ImportTool.main(ImportTool.java:282)
>>     at org.neo4j.tooling.ImportTool.main(ImportTool.java:244)
>>
>> My java is
>>
>> java version "1.8.0_45"
>> Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
>> Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
>>
>> Do I need 2.3 for the import?
>>
>> Thanks
>>   Frank
>>
>> Am Donnerstag, 11. Juni 2015 13:40:55 UTC+2 schrieb Michael Hunger:
>>>
>>> I created an import script which I added to my repository.
>>> On my machine it imports the data in 35 seconds.
>>>
>>> Which uses more sensible types for the fields and also skips all the 
>>> null values.
>>>
>>> It also uses a numeric id for the primary key which makes more sense to 
>>> me.
>>>
>>> If you optimized the dataset for Neo4j you could even use the node-id as 
>>> primary-key as the input data has a sane, incrementing id then it would be 
>>> way faster.
>>>
>>> I also added a neo4j-pokec directory with queries to use that numeric id 
>>> as input (probably should also use a input.json file that doesn't contains 
>>> "Pxxx" strings, not sure what the perf impact is of converting those 
>>> strings).
>>>
>>> Cheers, Michael
>>>
>>> https://github.com/jexp/nosql-tests/tree/my-import
>>>
>>> I did some preliminary testing
>>>
>>> Neo4j 2.2
>>>
>>> node benchmark.js neo4j-pokec -t 
>>> shortest,neighbors,neighbors2,aggregation,singleRead 
>>> INFO using server address  127.0.0.1
>>> INFO start
>>> INFO executing shortest path for 19 paths
>>> INFO total paths length: 104
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: *shortest* path, 19 items
>>> INFO Total Time for 19 requests: 85 ms
>>> INFO Average: *4.47 ms*
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing neighbors for 500 elements
>>> INFO total number of neighbors found: 9102
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: *neighbors*, 500 items
>>> INFO Total Time for 500 requests: 428 ms
>>> INFO Average: *0.86 ms*
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing neighbors 2nd degree for 500 elements
>>> INFO total number of neighbors2 found: 545530
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: *neighbors2*, 500 items
>>> INFO Total Time for 500 requests: 4850 ms
>>> INFO Average: *9.7 ms*
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing aggregation
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: *aggregate*, 1 items
>>> INFO Total Time for 1 requests: 14036 ms
>>> INFO Average: *14036 ms*
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing single read with 100000 documents
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: *single reads*, 100000 items
>>> INFO Total Time for 100000 requests: 83473 ms
>>> INFO Average: *0.83 ms*
>>> INFO 
>>> -----------------------------------------------------------------------------
>>>
>>>
>>> Neo4j 2.3
>>>
>>>  node benchmark.js neo4j-pokec -t 
>>> shortest,neighbors,neighbors2,aggregation,singleRead 
>>> INFO using server address  127.0.0.1
>>> INFO start
>>> INFO executing shortest path for 19 paths
>>> INFO total paths length: 104
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: *shortest* path, 19 items
>>> INFO Total Time for 19 requests: 69 ms
>>> INFO Average: *3.63 ms*
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing neighbors for 500 elements
>>> INFO total number of neighbors found: 9102
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: *neighbors*, 500 items
>>> INFO Total Time for 500 requests: 431 ms
>>> INFO Average: *0.86 ms*
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing neighbors 2nd degree for 500 elements
>>> INFO total number of neighbors2 found: 545530
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: *neighbors2*, 500 items
>>> INFO Total Time for 500 requests: 3441 ms
>>> INFO Average: *6.88 ms*
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing aggregation
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: *aggregate*, 1 items
>>> INFO Total Time for 1 requests: 2848 ms
>>> INFO Average: *2848 ms*
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO executing single read with 100000 documents
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> INFO Neo4J: *single reads*, 100000 items
>>> INFO Total Time for 100000 requests: 77760 ms
>>> INFO Average: *0.78 ms*
>>> INFO 
>>> -----------------------------------------------------------------------------
>>> DONE
>>>
>>>
>>> Am 10.06.2015 um 18:55 schrieb Frank Celler <[email protected]>:
>>>
>>> Hi Michael,
>>>
>>> thanks for sharing your preliminary findings. I'll incorporate them into 
>>> the benchmark suite and rerun the tests. I've seen that there is a 30day 
>>> trial for the enterprise edition. So I can tests that as well.
>>>
>>> Is it possible to upload the database where you changed the AGE 
>>> attribute? Or is there any easy cypher command to change the type?
>>>
>>> Thanks
>>>   Frank
>>>
>>>
>>> Am Mittwoch, 10. Juni 2015 17:27:05 UTC+2 schrieb Michael Hunger:
>>>>
>>>> I also did some experiments but didn't have the time to finish yet, 
>>>> here are my observations so far:
>>>>
>>>> *Arangodb Measurement*
>>>>
>>>> - index -> constraint `CREATE CONSTRAINT ON (p:PROFILES) ASSERT p._key 
>>>> IS UNIQUE;`
>>>> - seraph -> replace with node-neo4j 2.0.RC1 
>>>>   - uses 2 year old /cypher api, doesn't send X-Stream:true header
>>>>   - does not do efficient auth (encode creds on every call)
>>>>   - doesn't do pooling
>>>> - suboptimal queries
>>>> - make sure the concurrency level is adequate for the setup (utilize 
>>>> all cores but don't flood, use e.g. async.eachWithLimit)
>>>> - warmup with nodes and rels `MATCH ()--() return count(*);`
>>>> - enterprise with better vertical read/write scalability vs. community
>>>> - Use 12G-24G heap, 2G new gen (-Xmn2G)
>>>> - pagecache to 2.5G + growth (e.g. another 2.5G)
>>>> - in 2.2 set cache_type = soft or cache_type=none depending on 
>>>> available heap
>>>> - fix property encoding, e.g. AGE as int not string, don't store "null" 
>>>> !!
>>>>   -> affects esp. aggregate query
>>>> - don't re-run the benchmark on the same store, start at the initial one
>>>>   -> creating and deleting the additional PROFILES_TEMP nodes affects 
>>>> repeatability of results
>>>>
>>>> correct datatypes:
>>>>
>>>> * "null" should *never be stored*
>>>> * int: public, gender, completion_percentage, AGE,
>>>> * long/time: last_login, registration 
>>>> * optionally as label: gender, public
>>>>
>>>>   -> test repository (WIP): with changes in *description.js and 
>>>> benchmark.js*
>>>>
>>>> https://github.com/jexp/nosql-tests/tree/node-neo4j
>>>>
>>>> queries for  for neo4j-shell:
>>>>
>>>> export from="P/P1"
>>>> export to="P/P277"
>>>>
>>>> export key="P/P1"
>>>>
>>>> // warmup
>>>> MATCH ()--() return count(*);
>>>> // 61.245.128 rows
>>>>
>>>> MATCH (s:PROFILES) return count(*);
>>>> // 1.632.803 profiles
>>>> // 1.15 s
>>>>
>>>> profile
>>>>
>>>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) RETURN DISTINCT 
>>>> n._key;
>>>> // 295 rows 5 ms
>>>>
>>>>
>>>> // 1st degree neighbours
>>>> MATCH (:PROFILES {_key:{key}})-->(n) RETURN n._key;
>>>> // 14 rows 1ms 
>>>>
>>>> // 2nd degree neighbours
>>>> MATCH (s:PROFILES {_key:{key}})-->(x)
>>>> MATCH (x)-->(n:PROFILES)
>>>> RETURN DISTINCT n._key;
>>>> // 283 rows 6 ms
>>>>
>>>> // shortest path
>>>> MATCH (s:PROFILES {_key:{from}}),(t:PROFILES {_key:{to}}), 
>>>> p = shortestPath((s)-[*..15]->(t)) RETURN [x in nodes(p) | x._key] as 
>>>> path;
>>>> // 1 ms, don't return the full data only keys like in the other db's
>>>>
>>>> // aggregation
>>>> MATCH (f:PROFILES) RETURN f.AGE, count(*);
>>>> // 22s -> should be rather 1.5s
>>>>
>>>> // single read
>>>> MATCH (f:PROFILES) WHERE f._key = {key} RETURN f;
>>>> // or
>>>> MATCH (s:PROFILES {_key:{key}}) RETURN s;
>>>> // 1 row with 59 properties 1 ms
>>>>
>>>> // single writes
>>>> CREATE (s:PROFILES_TEMP {data}) RETURN id(s);
>>>>
>>>> // delete all nodes with a certain label
>>>> // loop until returns 0
>>>> MATCH (n:PROFILES_TEMP) WITH n LIMIT 5000 OPTIONAL MATCH (n)-[r]-() 
>>>> DELETE n,r RETURN count(*) as deleted
>>>> ----
>>>>
>>>> MATCH (s:PROFILES {_key:{key}})-[*1..2]->(n:PROFILES) WITH DISTINCT 
>>>> n._key as key RETURN count(*);
>>>> // 295 count 5-6ms
>>>>
>>>> MATCH (f:PROFILES) return id(f) % 140, count(*);
>>>> // 140 rows -> 1502 ms that's how it should be
>>>>
>>>> sample data:
>>>>
>>>> _key:"P/P1",
>>>> public:"1",
>>>> completion_percentage:"14",
>>>> gender:"1",
>>>> region:"zilinsky kraj, zilina",
>>>> last_login:"2012-05-25 11:20:00.0",
>>>> registration:"2005-04-03 00:00:00.0",
>>>> AGE:26,
>>>> body:"185 cm, 90 kg",
>>>> I_am_working_in_field:"it",
>>>> spoken_languages:"anglicky",
>>>> hobbies:"sportovanie, spanie, kino, jedlo, pocuvanie hudby, priatelia, 
>>>> divadlo",
>>>> I_most_enjoy_good_food:"v dobrej restauracii",
>>>> pets:"mam psa",
>>>> body_type:"null",
>>>> my_eyesight:"null",
>>>> eye_color:"null",
>>>> hair_color:"null",
>>>> hair_type:"null",
>>>> completed_level_of_education:"null",
>>>> favourite_color:"null",
>>>> relation_to_smoking:"null",
>>>> relation_to_alcohol:"null",
>>>> sign_in_zodiac:"null",
>>>> on_pokec_i_am_looking_for:"null",
>>>> love_is_for_me:"null",
>>>> relation_to_casual_sex:"null",
>>>> my_partner_should_be:"null",
>>>> marital_status:"null",
>>>> children:"null",
>>>> relation_to_children:"null",
>>>> I_like_movies:"null",
>>>> I_like_watching_movie:"null",
>>>> I_like_music:"null",
>>>> I_mostly_like_listening_to_music:"null",
>>>> the_idea_of_good_evening:"null",
>>>> I_like_specialties_from_kitchen:"null",
>>>> fun:"null",
>>>> I_am_going_to_concerts:"null",
>>>> my_active_sports:"null",
>>>> my_passive_sports:"null",
>>>> profession:"null",
>>>> I_like_books:"null",
>>>> life_style:"null",
>>>> music:"null",
>>>> cars:"null",
>>>> politics:"null",
>>>> relationships:"null",
>>>> art_culture:"null",
>>>> hobbies_interests:"null",
>>>> science_technologies:"null",
>>>> computers_internet:"null",
>>>> education:"null",
>>>> sport:"null",
>>>> movies:"null",
>>>> travelling:"null",
>>>> health:"null",
>>>> companies_brands:"null",
>>>> more:"null"
>>>>
>>>>
>>>> neo4j-server.properties:
>>>> org.neo4j.server.database.location=/Users/mh/support/arangodb/db/data
>>>> org.neo4j.server.webserver.port=8474
>>>> dbms.security.auth_enabled=false
>>>>
>>>>
>>>> neo4j-wrapper.conf:
>>>> wrapper.java.initmemory=8000
>>>> wrapper.java.maxmemory=8000
>>>> wrapper.java.additional=-Xmn2G
>>>>
>>>> neo4j.properties:
>>>> dbms.pagecache.memory=5G
>>>> keep_logical_logs=false
>>>> remote_shell_enabled=false
>>>> cache_type=soft
>>>> online_backup_enabled=false
>>>>
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to