Ice-9, you can use the transaction support in the latest Neo4p (>0.2220) :
REST::Neo4p->begin_work;
...queries and stuff
if ($something_bad_happened) {REST::Neo4p->commit; (The ../cypher endpoint is used by default, and ../transaction within a txn marked as above) On Wed, Jan 1, 2014 at 6:56 PM, Michael Hunger < [email protected]> wrote: > CC'in the Marks here. > > I don't know which endpoint Neo4p uses by default. > > As far as I can see you also run just one query per http-request / tx ? > Usually you want to batch a score of them in a single tx (e.g. 20k > elements). > > > To run your code we probably need the xls files as well :) > > Also your query below shouldn't run so long, even if you have some 2 > million entries in your db. > > Any chance to share your db with me? > > Am 02.01.2014 um 00:14 schrieb icenine <[email protected]>: > > Hi Michael, > > I've posted the newest code here: http://pastebin.com/7PikkZRP > > I've switched all of my CREATE UNIQUE statements to MERGE. I'm still > convinced though that the inserts are under-performing. It's taking about 5 > minutes for all of the statements in that code to execute 500 times. > Overtime this gap grows to around 8 even up to 12 minutes. My biggest > bottleneck seems to be the incident nodes and their relationships: > > neo4j-sh (?)$ START n=node(*) MATCH (n)-[r]-() RETURN DISTINCT labels(n), > type(r), count(*) ORDER BY labels(n)[0], type(r); > +-----------------------------------------------------------------------+ > | labels(n) | type(r) | count(*) | > +-----------------------------------------------------------------------+ > | ["DEGREE_OF_HARM"] | "HAS_INCIDENT_DEGREE_OF_HARM" | 2120424 | > | ["DEGREE_OF_HARM"] | "HAS_NRLS_DATA_TYPE" | 7 | > | ["INCIDENT"] | "HAS_INCIDENT_CATEGORY" | 2120457 | > | ["INCIDENT"] | "HAS_INCIDENT_DEGREE_OF_HARM" | 2120424 | > | ["INCIDENT"] | "HAS_INCIDENT_PATIENT" | 2120432 | > | ["INCIDENT"] | "HAS_INCIDENT_REPORTER" | 2120442 | > | ["INCIDENT"] | "HAS_INCIDENT_SPECIALITY" | 2120450 | > | ["INCIDENT"] | "HAS_NRLS_DATA_TYPE" | 2120486 | > | ["INCIDENT"] | "IS_NHS_TRUST_INCIDENT" | 2120483 | > | ["INCIDENT"] | "IS_NHS_TRUST_LOCATION_INCIDENT" | 2114664 | > | ["INCIDENT_CATEGORY"] | "HAS_INCIDENT_CATEGORY" | 2120457 | > | ["INCIDENT_CATEGORY"] | "HAS_NRLS_DATA_TYPE" | 16 | > | ["INCIDENT_REPORTER"] | "HAS_INCIDENT_REPORTER" | 2120442 | > | ["INCIDENT_REPORTER"] | "HAS_NRLS_DATA_TYPE" | 12 | > | ["INCIDENT_SPECIALITY"] | "HAS_INCIDENT_SPECIALITY" | 2120450 | > | ["INCIDENT_SPECIALITY"] | "HAS_NRLS_DATA_TYPE" | 17 | > | ["NHS_TRUST"] | "HAS_NHS_TRUST_LOCATION" | 480 | > | ["NHS_TRUST"] | "HAS_NRLS_DATA_TYPE" | 63 | > | ["NHS_TRUST"] | "IS_NHS_TRUST_INCIDENT" | 2120483 | > | ["NHS_TRUST_LOCATION"] | "HAS_NHS_TRUST_LOCATION" | 480 | > | ["NHS_TRUST_LOCATION"] | "IS_NHS_TRUST_LOCATION_INCIDENT" | 2114664 | > | ["NRLS_DATA_TYPE"] | "HAS_NRLS_DATA_TYPE" | 2123426 | > | ["PATIENT"] | "HAS_INCIDENT_PATIENT" | 2120432 | > | ["PATIENT"] | "HAS_NRLS_DATA_TYPE" | 2825 | > +-----------------------------------------------------------------------+ > 24 rows > 247418 ms > > MERGE seems to be slightly more consistent in performance than CREATE > UNIQUE though not that much faster. > > I've tried the following to tune the instance (note I have 8G's of RAM on > the VM and there's nothing else using it besides Neo4j and my extract > process which never takes up much more than 100M of RAM now that I've tuned > it with MAJ's suggestions): > > cache_type=hpc > node_cache_array_fraction=6 > relationship_cache_array_fraction=7 > #node_cache_size=1024 > relationship_cache_size=2G > > I haven't bothered tuning node_cache_size itself since it's my > relationship store that seems to be the biggest access point, accessing the > node count takes 14 seconds but accessing a relationship count takes around > 2 - 3 minutes. > > Current heap usage after a restart and while the script is running after > processing 1000 rows is ~ 500M. > > Current neostore sizes are: > > [root@miyu graph.db]# ls -l neostore* | awk '{printf("%10s %s\n", $5, > $9)}' > 63 neostore > 9 neostore.id > 55 neostore.labeltokenstore.db > 9 neostore.labeltokenstore.db.id > 456 neostore.labeltokenstore.db.names > 9 neostore.labeltokenstore.db.names.id > 29850422 neostore.nodestore.db > 9 neostore.nodestore.db.id > 68 neostore.nodestore.db.labels > 9 neostore.nodestore.db.labels.id > 177676780 neostore.propertystore.db > 128 neostore.propertystore.db.arrays > 9 neostore.propertystore.db.arrays.id > 9 neostore.propertystore.db.id > 162 neostore.propertystore.db.index > 9 neostore.propertystore.db.index.id > 722 neostore.propertystore.db.index.keys > 9 neostore.propertystore.db.index.keys.id > 679805312 neostore.propertystore.db.strings > 9 neostore.propertystore.db.strings.id > 560433951 neostore.relationshipstore.db > 9 neostore.relationshipstore.db.id > 45 neostore.relationshiptypestore.db > 9 neostore.relationshiptypestore.db.id > 380 neostore.relationshiptypestore.db.names > 9 neostore.relationshiptypestore.db.names.id > 1600 neostore.schemastore.db > 9 neostore.schemastore.db.id > > Current cached mappings settings are: > > neostore.nodestore.db.mapped_memory=50M > neostore.relationshipstore.db.mapped_memory=756M > neostore.propertystore.db.mapped_memory=300M > neostore.propertystore.db.strings.mapped_memory=756M > neostore.propertystore.db.arrays.mapped_memory=50M > > Current initial heap settings are: > > # Initial Java Heap Size (in MB) > wrapper.java.initmemory=2048 > > # Maximum Java Heap Size (in MB) > wrapper.java.maxmemory=5632 > > Current schema: > > neo4j-sh (?)$ schema > Welcome to the Neo4j Shell! Enter 'help' for a list of commands > [Reconnected to server] > Indexes > ON :DEGREE_OF_HARM(degree_of_harm) ONLINE (for uniqueness > constraint) > ON :INCIDENT(incident_description) ONLINE > > ON :INCIDENT(incident_timestamp) ONLINE > > ON :INCIDENT(incident_id) ONLINE (for uniqueness > constraint) > ON :INCIDENT_CATEGORY(category_level_01) ONLINE (for uniqueness > constraint) > ON :INCIDENT_REPORTER(reporter_level_01) ONLINE (for uniqueness > constraint) > ON :INCIDENT_SPECIALITY(speciality_level_01) ONLINE (for uniqueness > constraint) > ON :NHS_TRUST(name) ONLINE (for uniqueness > constraint) > ON :NHS_TRUST_LOCATION(location_level_01) ONLINE (for uniqueness > constraint) > ON :NRLS_DATA_TYPE(code) ONLINE (for uniqueness > constraint) > ON :PATIENT(patient_age) ONLINE > > ON :PATIENT(patient_sex) ONLINE > > ON :PATIENT(patient_ethnicity) ONLINE > > > Constraints > ON (nrls_data_type:NRLS_DATA_TYPE) ASSERT nrls_data_type.code IS UNIQUE > ON (nhs_trust:NHS_TRUST) ASSERT nhs_trust.name IS UNIQUE > ON (degree_of_harm:DEGREE_OF_HARM) ASSERT degree_of_harm.degree_of_harm > IS UNIQUE > ON (incident:INCIDENT) ASSERT incident.incident_id IS UNIQUE > ON (nhs_trust_location:NHS_TRUST_LOCATION) ASSERT > nhs_trust_location.location_level_01 IS UNIQUE > ON (incident_reporter:INCIDENT_REPORTER) ASSERT > incident_reporter.reporter_level_01 IS UNIQUE > ON (incident_category:INCIDENT_CATEGORY) ASSERT > incident_category.category_level_01 IS UNIQUE > ON (incident_speciality:INCIDENT_SPECIALITY) ASSERT > incident_speciality.speciality_level_01 IS UNIQUE > > I'm going to keep trying to tweak but since I can't use property index > hints with my MERGE statements (which I think would help with the incident > relationships) I'm just loading anyway so I can get this done as I've been > at it for a while. > > If you have any further suggestions (or anyone else does) I'd be glad to > try them out. > > ~ icenine > > On Wednesday, January 1, 2014 10:33:33 PM UTC, Michael Hunger wrote: >> >> Great ! >> >> Looks good. >> >> I think if you use parameters and Neo4p's cypher support passing in >> perl-hashes for parameters and the using transactional endpoint with your >> import data it shouldn't take too long to import your 2 million data points. >> >> #1 parameters >> #2 transactional endpoint >> #3 sensible batch-size (e.g. 20k per commit) >> #4 usually when just creating data you don't have to return anything. >> >> Cheers >> >> Michael >> >> Am 01.01.2014 um 19:48 schrieb JDS <[email protected]>: >> >> BTW, I love the simplicity of something like this: >> >> neo4j-sh (?)$ MATCH (ndt:NRLS_DATA_TYPE { code : 'IN05_lvl1' }) >> > MERGE (ic:INCIDENT_CATEGORY { category_level_01 : 'FOOBAR' })-[r: >> HAS_NRLS_DATA_TYPE]->(ndt) >> > RETURN ic, r; >> +----------------------------------------------------------- >> ------------------+ >> | ic | r >> | >> +----------------------------------------------------------- >> ------------------+ >> | Node[2121668]{category_level_01:"FOOBAR"} | :HAS_NRLS_DATA_TYPE[ >> 16880045]{} | >> +----------------------------------------------------------- >> ------------------+ >> 1 row >> Nodes created: 1 >> Relationships created: 1 >> Properties set: 1 >> >> >> >> On Wednesday, January 1, 2014 6:36:04 PM UTC, Michael Hunger wrote: >>> >>> No worries, it's still early in the New Year :) >>> >>> But you definitely want to write a blog post about what you're doing >>> with Neo4j? Right? >>> >>> Happy New Year >>> >>> Michael >>> >>> Am 01.01.2014 um 19:33 schrieb JDS <[email protected]>: >>> >>> Ugh *shame* >>> >>> Thanks Mike >>> >>> On Wednesday, January 1, 2014 6:32:16 PM UTC, Michael Hunger wrote: >>>> >>>> Typo: >>>> >>>> In query #4 you use "NRLS_DATA_TYPE" in the previous ones you use " >>>> NLRS_DATA_TYPE" >>>> >>>> N_RL_S vs. N_LR_S >>>> >>>> HTH >>>> >>>> Michael >>>> >>>> Am 01.01.2014 um 19:26 schrieb JDS <[email protected]>: >>>> >>>> Maybe I'm wrong but I thought that all 3 of the top queries would >>>> return data based on the data returned by 1st and 4th query so I'm a little >>>> confused. Server is 2.0.0 enterprise stable. >>>> >>>> neo4j-sh (?)$ START n=node(*) WHERE HAS (n.code) AND n.code = >>>> 'IN05_lvl1' RETURN n.code; >>>> +-------------+ >>>> | n.code | >>>> +-------------+ >>>> | "IN05_lvl1" | >>>> +-------------+ >>>> 1 row >>>> 102405 ms >>>> neo4j-sh (?)$ MATCH (ndt:NLRS_DATA_TYPE) WHERE ndt.code = 'IN05_lvl1' >>>> RETURN ndt.code; >>>> +----------+ >>>> | ndt.code | >>>> +----------+ >>>> +----------+ >>>> 0 row >>>> 31 ms >>>> neo4j-sh (?)$ MATCH (ndt:NLRS_DATA_TYPE { code : 'IN05_lvl1' }) RETURN >>>> ndt.code; >>>> +----------+ >>>> | ndt.code | >>>> +----------+ >>>> +----------+ >>>> 0 row >>>> 20 ms >>>> neo4j-sh (?)$ MATCH (ndt:NRLS_DATA_TYPE) RETURN ndt.code; >>>> >>>> +-------------------+ >>>> | ndt.code | >>>> +-------------------+ >>>> | "RP07" | >>>> | "IN07" | >>>> | "Age_at_Incident" | >>>> | "ST01_LVL1" | >>>> | "PD09" | >>>> | "PD05_lvl1" | >>>> | "IN05_lvl1" | >>>> | "IN03_lvl1" | >>>> | "IN07_01MMYY" | >>>> | "PD11" | >>>> | "IN02_A_01" | >>>> | "IN01" | >>>> | "PD02" | >>>> +-------------------+ >>>> 13 rows >>>> 113 ms >>>> neo4j-sh (?)$ MATCH (ndt:NLRS_DATA_TYPE { code : "IN05_lvl1" }) RETURN >>>> ndt.code; >>>> >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Neo4j" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>>> >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
