predictability

Michael Hunger Mon, 10 Mar 2014 02:27:08 -0700

Hi Tom,

what I did was basically dumping the nodes and rels from the 2.1. database 
using the shell tools into a csv file and then importing it into 2.0 using the 
batch-importer (would also have worked with the shell tools but I was too lazy 
:)
Could be that I missed some.


I think you'd be faster importing your CSV data with the shell tools in 2.0 
(use a batch-size of 1k for the relationships) (or the batch-importer)

you would do similar things like with load csv (adapt to the shape of your 
csv-files)

import-cypher -i nodes.csv create(n:#{label}) set jurt_id = {jurt_id}
import-cypher -i rels.csv match (n),(m) where id(n) = {start} and id(m)={end}  
create (n)-[(:#{label}]->(m)

Michael

Here is what I did:

#1 built the shell tools for 2.1 -> 
s3://dist.neo4j.org/jexp/shell/neo4j-shell-tools-2.1.zip

Unzipped the zip in the lib directory of the 2.1 server

started the shell

bin/neo4j-shell -path ~/Downloads/tom/graph21.db

ran the following 2 export commands (for a real export you'd probably export 
all node properties one by one) took in total perhaps 20s

import-cypher -o nodes.csv match(n) return id(n) as `:id`,labels(n)[0] as 
`:label`, n.jurt_id as jurt_id
Query: match(n) return id(n) as `:id`,labels(n)[0] as `:label`, n.jurt_id as 
jurt_id infile (none) delim ',' quoted false outfile nodes.csv batch-size 1000
Import statement execution created 28184 rows of output.

import-cypher -o nodes.csv match(n)-[r]->(m) return id(n) as `s:id`,id(m) as 
`e:id`,type(r) as `:label` 
Query: match(n)-[r]->(m) return id(n) as `s:id`,id(m) as `e:id`,type(r) as 
`:label` infile (none) delim ',' quoted false outfile nodes.csv batch-size 1000
Import statement execution created 1276254 rows of output.

Then I imported it with the batch-importer

import.sh tom.db nodes.csv rels.csv

using a properties file with 

batch_import.csv.delim=,

Am 10.03.2014 um 10:08 schrieb Tom Zeppenfeldt <[email protected]>:

> Hi Michael,
> 
> Thanks for your reply. Basically you suggest not to overspecify the queries, 
> by leaving out the  labels or identifiers when not necessary. And I learned 
> my lesson with regard to using snapshots :)
> 
> BTW :  Assuming that you are using the db that I shared with you and 
> converted it to a 2.0.1. version, I appreciated the increased speed, but does 
> it also explain why the returned counts are different ? 
> 
> If you have converted the db, could you share the datastore (the 2.0.1. one) 
> back to me ? 
> 
> Thanks a lot !
> 
> Best,  Tom 
> 
> On Monday, 10 March 2014 07:11:00 UTC+1, Michael Hunger wrote:
> Hi Tom,
> 
> with 2.0.1 the query time went down to 1.6 seconds.
> It still has to pull through and aggregate 500.000 rels but should actually 
> be faster doing this.
> 
> match (j1:jurt)-[:HAS_TERM]->(t)<-[:HAS_TERM]-(j2) 
> where j1.jurt_id = 'J70000' AND j2 <> j1
> RETURN j2,count(*) as commonterms 
> order by commonterms desc 
> limit 3;
> 
> +---------------------------------------------+
> | j2                            | commonterms |
> +---------------------------------------------+
> | Node[19946]{jurt_id:"J72191"} | 68          |
> | Node[20977]{jurt_id:"J73483"} | 67          |
> | Node[21658]{jurt_id:"J74261"} | 64          |
> +---------------------------------------------+
> 3 rows
> 1614 ms
> 
> Cheers,
> 
> Michael
> 
> ----
> (michael}-[:SUPPORTS]->(YOU)-[:USE]->(Neo4j)
> Learn Online, Offline or Read a Book (in Deutsch)
> We're trading T-shirts for cool GraphGist Models
> 
> 
> 
> 
> 
> Am 09.03.2014 um 20:00 schrieb Michael Hunger <[email protected]>:
> 
>> Ouch
>> 
>> Share via dropbox
>> 
>> You can share the 2.1 store with me or the loadcsv script with your csv files
>> 
>> Thanks for all the great feedback btw
>> 
>> Can you send me your postal address and t-shirt size?
>> 
>> Thx
>> 
>> Sent from mobile device
>> 
>> Am 09.03.2014 um 19:08 schrieb Tom Zeppenfeldt <[email protected]>:
>> 
>>> Ok Michael,
>>> 
>>> - Just a question that may sound stupid :  What's  the best way to share 
>>> things privately over here ? Not seeing any clear option to do so.
>>> - I'll try to setup a server with 2.0.1 and try to use the 
>>> shell-import-tools. FYI : uploading the 1.2M rels uring LOAD CSV took over 
>>> 24 hrs ...  hope your shell-import-tools work faster ..
>>>  
>>> Best,
>>> 
>>> Tom
>>> 
>>> 
>>> Met vriendelijke groet / With kind regards
>>> 
>>> 
>>> 
>>> Ir. T. Zeppenfeldt
>>> van der Waalsstraat 30
>>> 6706 JR  Wageningen
>>> The Netherlands
>>> 
>>> Mobile: +31 6 23 28 78 06
>>> Phone: +31 3 17 84 22 17
>>> E-mail: [email protected]
>>> Web: www.ophileon.com
>>> Twitter: tomzeppenfeldt
>>> Skype: tomzeppenfeldt
>>> 
>>> 
>>> 2014-03-09 16:27 GMT+01:00 Michael Hunger <[email protected]>:
>>> Could you send me the profike output from the shell? Easier to read on 
>>> mobile and also share the db with me privately
>>> 
>>> Can you also try the query in 2.0.1?
>>> 
>>> You can import the data using my shell-import-tools
>>> 
>>> Or just generate textual cypher statements from load-csv
>>> 
>>> Sent from mobile device
>>> 
>>> Am 09.03.2014 um 16:11 schrieb Tom Zeppenfeldt <[email protected]>:
>>> 
>>>> query is executed as follows, in which I spot:
>>>> 
>>>>             "_rows" : 478380,
>>>>             "_db_hits" : 956760,
>>>> 
>>>> which is actually higher (= worse ??)  than the original .. 
>>>> 
>>>> {
>>>>   "columns" : [ "j1.jurt_id", "j2.jurt_id", "commonterms" ],
>>>>   "data" : [ [ "J70000", "J72191", 68 ], [ "J70000", "J73483", 67 ], [ 
>>>> "J70000", "J75683", 66 ] ],
>>>>   "plan" : {
>>>>     "args" : {
>>>>       "returnItemNames" : [ "j1.jurt_id", "j2.jurt_id", "commonterms" ],
>>>>       "_rows" : 3,
>>>>       "_db_hits" : 0,
>>>>       "symKeys" : [ "j1.jurt_id", "j2.jurt_id", "  
>>>> INTERNAL_AGGREGATEb6207bc9-3236-4e8f-ad48-51d2d73e3372" ]
>>>>     },
>>>>     "dbHits" : 0,
>>>>     "name" : "ColumnFilter",
>>>>     "children" : [ {
>>>>       "args" : {
>>>>         "limit" : "Literal(3)",
>>>>         "orderBy" : [ "SortItem(Cached(  
>>>> INTERNAL_AGGREGATEb6207bc9-3236-4e8f-ad48-51d2d73e3372 of type 
>>>> Integer),false)" ],
>>>>         "_rows" : 3,
>>>>         "_db_hits" : 0
>>>>       },
>>>>       "dbHits" : 0,
>>>>       "name" : "Top",
>>>>       "children" : [ {
>>>>         "args" : {
>>>>           "keys" : [ "Cached(j1.jurt_id of type Any)", "Cached(j2.jurt_id 
>>>> of type Any)" ],
>>>>           "_rows" : 9992,
>>>>           "aggregates" : [ "(  
>>>> INTERNAL_AGGREGATEb6207bc9-3236-4e8f-ad48-51d2d73e3372,Count(t))" ],
>>>>           "_db_hits" : 0
>>>>         },
>>>>         "dbHits" : 0,
>>>>         "name" : "EagerAggregation",
>>>>         "children" : [ {
>>>>           "args" : {
>>>>             "_rows" : 478380,
>>>>             "_db_hits" : 956760,
>>>>             "exprKeys" : [ "j1.jurt_id", "j2.jurt_id" ],
>>>>             "symKeys" : [ "j1", "t", "  UNNAMED79", "j2", "  UNNAMED62" ]
>>>>           },
>>>>           "dbHits" : 956760,
>>>>           "name" : "Extract",
>>>>           "children" : [ {
>>>>             "args" : {
>>>>               "_rows" : 478380,
>>>>               "_db_hits" : 0,
>>>>               "pred" : "NOT(j2 == j1)"
>>>>             },
>>>>             "dbHits" : 0,
>>>>             "name" : "Filter",
>>>>             "children" : [ {
>>>>               "args" : {
>>>>                 "g" : "(j1)-['  UNNAMED62']-(t),(j2)-['  UNNAMED79']-(t)",
>>>>                 "_rows" : 478380,
>>>>                 "_db_hits" : 0
>>>>               },
>>>>               "dbHits" : 0,
>>>>               "name" : "SimplePatternMatcher",
>>>>               "children" : [ {
>>>>                 "args" : {
>>>>                   "identifiers" : [ "j1" ],
>>>>                   "query" : "{jurtid}",
>>>>                   "producer" : "SchemaIndex",
>>>>                   "_rows" : 1,
>>>>                   "property" : "jurt_id",
>>>>                   "label" : "jurt",
>>>>                   "_db_hits" : 0,
>>>>                   "identifier" : "j1"
>>>>                 },
>>>>                 "dbHits" : 0,
>>>>                 "name" : "SchemaIndex",
>>>>                 "children" : [ ],
>>>>                 "rows" : 1
>>>>               } ],
>>>>               "rows" : 478380
>>>>             } ],
>>>>             "rows" : 478380
>>>>           } ],
>>>>           "rows" : 478380
>>>>         } ],
>>>>         "rows" : 9992
>>>>       } ],
>>>>       "rows" : 3
>>>>     } ],
>>>>     "rows" : 3
>>>>   }
>>>> }
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Tuning/ performance / predictability

Reply via email to