Re: [Neo4j] Missing data MATCH vs START

Mark Jensen Wed, 01 Jan 2014 16:41:10 -0800

Ice-9, you can use the transaction support in the latest Neo4p (>0.2220) :
REST::Neo4p->begin_work;
...queries and stuff
if ($something_bad_happened) {


REST::Neo4p->commit;

(The ../cypher endpoint is used by default, and ../transaction within a txn
marked as above)



On Wed, Jan 1, 2014 at 6:56 PM, Michael Hunger <
[email protected]> wrote:

> CC'in the Marks here.
>
> I don't know which endpoint Neo4p uses by default.
>
> As far as I can see you also run just one query per http-request / tx ?
> Usually you want to batch a score of them in a single tx (e.g. 20k
> elements).
>
>
> To run your code we probably need the xls files as well :)
>
> Also your query below shouldn't run so long, even if you have some 2
> million entries in your db.
>
> Any chance to share your db with me?
>
> Am 02.01.2014 um 00:14 schrieb icenine <[email protected]>:
>
> Hi Michael,
>
> I've posted the newest code here: http://pastebin.com/7PikkZRP
>
> I've switched all of my CREATE UNIQUE statements to MERGE. I'm still
> convinced though that the inserts are under-performing. It's taking about 5
> minutes for all of the statements in that code to execute 500 times.
> Overtime this gap grows to around 8 even up to 12 minutes. My biggest
> bottleneck seems to be the incident nodes and their relationships:
>
> neo4j-sh (?)$ START n=node(*) MATCH (n)-[r]-() RETURN DISTINCT labels(n),
> type(r), count(*) ORDER BY labels(n)[0], type(r);
> +-----------------------------------------------------------------------+
> | labels(n)               | type(r)                          | count(*) |
> +-----------------------------------------------------------------------+
> | ["DEGREE_OF_HARM"]      | "HAS_INCIDENT_DEGREE_OF_HARM"    | 2120424  |
> | ["DEGREE_OF_HARM"]      | "HAS_NRLS_DATA_TYPE"             | 7        |
> | ["INCIDENT"]            | "HAS_INCIDENT_CATEGORY"          | 2120457  |
> | ["INCIDENT"]            | "HAS_INCIDENT_DEGREE_OF_HARM"    | 2120424  |
> | ["INCIDENT"]            | "HAS_INCIDENT_PATIENT"           | 2120432  |
> | ["INCIDENT"]            | "HAS_INCIDENT_REPORTER"          | 2120442  |
> | ["INCIDENT"]            | "HAS_INCIDENT_SPECIALITY"        | 2120450  |
> | ["INCIDENT"]            | "HAS_NRLS_DATA_TYPE"             | 2120486  |
> | ["INCIDENT"]            | "IS_NHS_TRUST_INCIDENT"          | 2120483  |
> | ["INCIDENT"]            | "IS_NHS_TRUST_LOCATION_INCIDENT" | 2114664  |
> | ["INCIDENT_CATEGORY"]   | "HAS_INCIDENT_CATEGORY"          | 2120457  |
> | ["INCIDENT_CATEGORY"]   | "HAS_NRLS_DATA_TYPE"             | 16       |
> | ["INCIDENT_REPORTER"]   | "HAS_INCIDENT_REPORTER"          | 2120442  |
> | ["INCIDENT_REPORTER"]   | "HAS_NRLS_DATA_TYPE"             | 12       |
> | ["INCIDENT_SPECIALITY"] | "HAS_INCIDENT_SPECIALITY"        | 2120450  |
> | ["INCIDENT_SPECIALITY"] | "HAS_NRLS_DATA_TYPE"             | 17       |
> | ["NHS_TRUST"]           | "HAS_NHS_TRUST_LOCATION"         | 480      |
> | ["NHS_TRUST"]           | "HAS_NRLS_DATA_TYPE"             | 63       |
> | ["NHS_TRUST"]           | "IS_NHS_TRUST_INCIDENT"          | 2120483  |
> | ["NHS_TRUST_LOCATION"]  | "HAS_NHS_TRUST_LOCATION"         | 480      |
> | ["NHS_TRUST_LOCATION"]  | "IS_NHS_TRUST_LOCATION_INCIDENT" | 2114664  |
> | ["NRLS_DATA_TYPE"]      | "HAS_NRLS_DATA_TYPE"             | 2123426  |
> | ["PATIENT"]             | "HAS_INCIDENT_PATIENT"           | 2120432  |
> | ["PATIENT"]             | "HAS_NRLS_DATA_TYPE"             | 2825     |
> +-----------------------------------------------------------------------+
> 24 rows
> 247418 ms
>
> MERGE seems to be slightly more consistent in performance than CREATE
> UNIQUE though not that much faster.
>
> I've tried the following to tune the instance (note I have 8G's of RAM on
> the VM and there's nothing else using it besides Neo4j and my extract
> process which never takes up much more than 100M of RAM now that I've tuned
> it with MAJ's suggestions):
>
> cache_type=hpc
> node_cache_array_fraction=6
> relationship_cache_array_fraction=7
> #node_cache_size=1024
> relationship_cache_size=2G
>
> I haven't bothered tuning node_cache_size itself since it's my
> relationship store that seems to be the biggest access point, accessing the
> node count takes 14 seconds but accessing a relationship count takes around
> 2 - 3 minutes.
>
> Current heap usage after a restart and while the script is running after
> processing 1000 rows is ~ 500M.
>
> Current neostore sizes are:
>
> [root@miyu graph.db]# ls -l neostore* | awk '{printf("%10s %s\n", $5,
> $9)}'
>         63 neostore
>          9 neostore.id
>         55 neostore.labeltokenstore.db
>          9 neostore.labeltokenstore.db.id
>        456 neostore.labeltokenstore.db.names
>          9 neostore.labeltokenstore.db.names.id
>   29850422 neostore.nodestore.db
>          9 neostore.nodestore.db.id
>         68 neostore.nodestore.db.labels
>          9 neostore.nodestore.db.labels.id
>  177676780 neostore.propertystore.db
>        128 neostore.propertystore.db.arrays
>          9 neostore.propertystore.db.arrays.id
>          9 neostore.propertystore.db.id
>        162 neostore.propertystore.db.index
>          9 neostore.propertystore.db.index.id
>        722 neostore.propertystore.db.index.keys
>          9 neostore.propertystore.db.index.keys.id
>  679805312 neostore.propertystore.db.strings
>          9 neostore.propertystore.db.strings.id
>  560433951 neostore.relationshipstore.db
>          9 neostore.relationshipstore.db.id
>         45 neostore.relationshiptypestore.db
>          9 neostore.relationshiptypestore.db.id
>        380 neostore.relationshiptypestore.db.names
>          9 neostore.relationshiptypestore.db.names.id
>       1600 neostore.schemastore.db
>          9 neostore.schemastore.db.id
>
> Current cached mappings settings are:
>
> neostore.nodestore.db.mapped_memory=50M
> neostore.relationshipstore.db.mapped_memory=756M
> neostore.propertystore.db.mapped_memory=300M
> neostore.propertystore.db.strings.mapped_memory=756M
> neostore.propertystore.db.arrays.mapped_memory=50M
>
> Current initial heap settings are:
>
> # Initial Java Heap Size (in MB)
> wrapper.java.initmemory=2048
>
> # Maximum Java Heap Size (in MB)
> wrapper.java.maxmemory=5632
>
> Current schema:
>
> neo4j-sh (?)$ schema
> Welcome to the Neo4j Shell! Enter 'help' for a list of commands
> [Reconnected to server]
> Indexes
>   ON :DEGREE_OF_HARM(degree_of_harm)           ONLINE (for uniqueness
> constraint)
>   ON :INCIDENT(incident_description)           ONLINE
>
>   ON :INCIDENT(incident_timestamp)             ONLINE
>
>   ON :INCIDENT(incident_id)                    ONLINE (for uniqueness
> constraint)
>   ON :INCIDENT_CATEGORY(category_level_01)     ONLINE (for uniqueness
> constraint)
>   ON :INCIDENT_REPORTER(reporter_level_01)     ONLINE (for uniqueness
> constraint)
>   ON :INCIDENT_SPECIALITY(speciality_level_01) ONLINE (for uniqueness
> constraint)
>   ON :NHS_TRUST(name)                          ONLINE (for uniqueness
> constraint)
>   ON :NHS_TRUST_LOCATION(location_level_01)    ONLINE (for uniqueness
> constraint)
>   ON :NRLS_DATA_TYPE(code)                     ONLINE (for uniqueness
> constraint)
>   ON :PATIENT(patient_age)                     ONLINE
>
>   ON :PATIENT(patient_sex)                     ONLINE
>
>   ON :PATIENT(patient_ethnicity)               ONLINE
>
>
> Constraints
>   ON (nrls_data_type:NRLS_DATA_TYPE) ASSERT nrls_data_type.code IS UNIQUE
>   ON (nhs_trust:NHS_TRUST) ASSERT nhs_trust.name IS UNIQUE
>   ON (degree_of_harm:DEGREE_OF_HARM) ASSERT degree_of_harm.degree_of_harm
> IS UNIQUE
>   ON (incident:INCIDENT) ASSERT incident.incident_id IS UNIQUE
>   ON (nhs_trust_location:NHS_TRUST_LOCATION) ASSERT
> nhs_trust_location.location_level_01 IS UNIQUE
>   ON (incident_reporter:INCIDENT_REPORTER) ASSERT
> incident_reporter.reporter_level_01 IS UNIQUE
>   ON (incident_category:INCIDENT_CATEGORY) ASSERT
> incident_category.category_level_01 IS UNIQUE
>   ON (incident_speciality:INCIDENT_SPECIALITY) ASSERT
> incident_speciality.speciality_level_01 IS UNIQUE
>
> I'm going to keep trying to tweak but since I can't use property index
> hints with my MERGE statements (which I think would help with the incident
> relationships) I'm just loading anyway so I can get this done as I've been
> at it for a while.
>
> If you have any further suggestions (or anyone else does) I'd be glad to
> try them out.
>
> ~ icenine
>
> On Wednesday, January 1, 2014 10:33:33 PM UTC, Michael Hunger wrote:
>>
>> Great !
>>
>> Looks good.
>>
>> I think if you use parameters and Neo4p's cypher support passing in
>> perl-hashes for parameters and the using transactional endpoint with your
>> import data it shouldn't take too long to import your 2 million data points.
>>
>> #1 parameters
>> #2 transactional endpoint
>> #3 sensible batch-size (e.g. 20k per commit)
>> #4 usually when just creating data you don't have to return anything.
>>
>> Cheers
>>
>> Michael
>>
>> Am 01.01.2014 um 19:48 schrieb JDS <[email protected]>:
>>
>> BTW, I love the simplicity of something like this:
>>
>> neo4j-sh (?)$ MATCH (ndt:NRLS_DATA_TYPE { code : 'IN05_lvl1' })
>> > MERGE (ic:INCIDENT_CATEGORY { category_level_01 : 'FOOBAR' })-[r:
>> HAS_NRLS_DATA_TYPE]->(ndt)
>> > RETURN ic, r;
>> +-----------------------------------------------------------
>> ------------------+
>> | ic                                        | r
>>       |
>> +-----------------------------------------------------------
>> ------------------+
>> | Node[2121668]{category_level_01:"FOOBAR"} | :HAS_NRLS_DATA_TYPE[
>> 16880045]{} |
>> +-----------------------------------------------------------
>> ------------------+
>> 1 row
>> Nodes created: 1
>> Relationships created: 1
>> Properties set: 1
>>
>>
>>
>> On Wednesday, January 1, 2014 6:36:04 PM UTC, Michael Hunger wrote:
>>>
>>> No worries, it's still early in the New Year :)
>>>
>>> But you definitely want to write a blog post about what you're doing
>>> with Neo4j? Right?
>>>
>>> Happy New Year
>>>
>>> Michael
>>>
>>> Am 01.01.2014 um 19:33 schrieb JDS <[email protected]>:
>>>
>>> Ugh *shame*
>>>
>>> Thanks Mike
>>>
>>> On Wednesday, January 1, 2014 6:32:16 PM UTC, Michael Hunger wrote:
>>>>
>>>> Typo:
>>>>
>>>> In query #4 you use "NRLS_DATA_TYPE" in the previous ones you use "
>>>> NLRS_DATA_TYPE"
>>>>
>>>> N_RL_S vs. N_LR_S
>>>>
>>>> HTH
>>>>
>>>> Michael
>>>>
>>>> Am 01.01.2014 um 19:26 schrieb JDS <[email protected]>:
>>>>
>>>> Maybe I'm wrong but I thought that all 3 of the top queries would
>>>> return data based on the data returned by 1st and 4th query so I'm a little
>>>> confused. Server is 2.0.0 enterprise stable.
>>>>
>>>> neo4j-sh (?)$ START n=node(*) WHERE HAS (n.code) AND n.code =
>>>> 'IN05_lvl1' RETURN n.code;
>>>> +-------------+
>>>> | n.code      |
>>>> +-------------+
>>>> | "IN05_lvl1" |
>>>> +-------------+
>>>> 1 row
>>>> 102405 ms
>>>> neo4j-sh (?)$ MATCH (ndt:NLRS_DATA_TYPE) WHERE ndt.code = 'IN05_lvl1'
>>>> RETURN ndt.code;
>>>> +----------+
>>>> | ndt.code |
>>>> +----------+
>>>> +----------+
>>>> 0 row
>>>> 31 ms
>>>> neo4j-sh (?)$ MATCH (ndt:NLRS_DATA_TYPE { code : 'IN05_lvl1' }) RETURN
>>>> ndt.code;
>>>> +----------+
>>>> | ndt.code |
>>>> +----------+
>>>> +----------+
>>>> 0 row
>>>> 20 ms
>>>> neo4j-sh (?)$ MATCH (ndt:NRLS_DATA_TYPE) RETURN ndt.code;
>>>>
>>>> +-------------------+
>>>> | ndt.code          |
>>>> +-------------------+
>>>> | "RP07"            |
>>>> | "IN07"            |
>>>> | "Age_at_Incident" |
>>>> | "ST01_LVL1"       |
>>>> | "PD09"            |
>>>> | "PD05_lvl1"       |
>>>> | "IN05_lvl1"       |
>>>> | "IN03_lvl1"       |
>>>> | "IN07_01MMYY"     |
>>>> | "PD11"            |
>>>> | "IN02_A_01"       |
>>>> | "IN01"            |
>>>> | "PD02"            |
>>>> +-------------------+
>>>> 13 rows
>>>> 113 ms
>>>> neo4j-sh (?)$ MATCH (ndt:NLRS_DATA_TYPE { code : "IN05_lvl1" }) RETURN
>>>> ndt.code;
>>>>
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>>
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Neo4j] Missing data MATCH vs START

Reply via email to