José, Let's continue the discussion on the google group
With larger I meant amount of data, not size of statements As I also point out in various places we recommend creating only small subgraphs with a single statement separated by srmicolons. Eg up to 100 nodes and rels Gigantic statements just let the parser explode I recommending splitting them up into statements creating subgraphs Or create nodes and later match them by label & property to connect them Make sure to have appropriate indexes / constraints You should also surround blocks if statements with begin and commit commands Von meinem iPhone gesendet > Am 19.11.2014 um 04:18 schrieb José F. Morales Ph.D. <[email protected]>: > > Hey Michael and Kenny > > Thanks you guys a bunch for the help. > > Let me give you a little background. I am charged to make a prototype of a > tool (“LabCards”) that we hope to use in the hospital and beyond at some > point . In preparation for making the main prototype, I made two prior Neo4j > databases that worked exactly as I wanted them to. The first database was > built with NIH data and had 183 nodes and around 7500 relationships. The > second database was the Pre-prototype and it had 1080 nodes and around 2000 > relationships. I created these in the form of cypher statements and either > pasted them in the Neo4j browser or used the neo4j shell and loaded them as > text files. Before doing that I checked the cypher code with Sublime Text 2 > that highlights the code. Both databases loaded fine in both methods and did > what I wanted them to do. > > As you might imagine, the prototype is an expansion of the mini-prototype. > It has almost the same data model and I built it as a series of cypher > statements as well. My first version of the prototype had ~60k nodes and > 160k relationships. > > I should say that a feature of this model is that all the source and target > nodes have relationships that point to each other. No node points to itself > as far as I know. This file was 41 Mb of cypher code that I tried to load via > the neo4j shell. > > In fact, I was following your advise on loading big data files... “Use the > Neo4j-Shell for larger Imports” > (http://jexp.de/blog/2014/06/load-csv-into-neo4j-quickly-and-successfully/). > This first time out, Java maxed out its memory allocated at 4Gb 2x and did > not complete loading in 24 hours. I killed it. > > I then contacted Kenny, and he generously gave me some advice regarding the > properties file (below) and again the same deal (4 Gb Memory 2x) with Java > and no success in about 24 hours. I killed that one too. > > Given my loading problems, I have subsequently eliminated a bunch > relationships (100k) so that the file is now 21 Mb. Alot of these were > duplicates that I didn’t pick up before and am trying it again. So far 15 > min into it, similar situation. The difference is that Java is using 1.7 and > 0.5 GB of memory > > Here is the cypher for a typical node… > > CREATE ( CLT_1:`CLT SOURCE`:BIOMEDICAL:TEST_NAME:`Laboratory > Procedure`:lbpr:`Procedures`:PROC:T059:`B1.3.1.1`:TZ{NAME:'Acetoacetate > (ketone body)',SYNONYM:'',Sample:'SERUM, URINE',MEDCODE:10010,CUI:'NA’}) > > Here is the cypher for a typical relationship... > > CREATE(CLT_1)-[:MEASUREMENT_OF{Phylum:'TZ',CAT:'TEST.NAME',Ui_Rl:'T157',RESULT:'',Type:'',Semantic_Distance_Score:'NA',Path_Length:'NA',Path_Steps:'NA'}]->(CLT_TARGET_3617), > > I will let you know how this one turns out. I hope this is helpful. > > Many, many thanks fellas!!! > > Jose > >> On Nov 18, 2014, at 8:33 PM, Michael Hunger >> <[email protected]> wrote: >> >> Hi José, >> >> can you provide perhaps more detail about your dataset (e.g. sample of the >> csv, size, etc. perhaps an output of csvstat (of csvkit) would be helpful), >> your cypher queries to load it >> >> Have you seen my other blog post, which explains two big caveats that people >> run into when trying this? jexp.de/blog/2014/10/load-cvs-with-success/ >> >> Cheers, Michael >> >>> On Tue, Nov 18, 2014 at 8:43 PM, Kenny Bastani <[email protected]> wrote: >>> Hey Jose, >>> >>> There is definitely an answer. Let me put you in touch with the data import >>> master: Michael Hunger. >>> >>> Michael, I think the answers here will be pretty straight forward for you. >>> You met Jose at GraphConnect NY last year, so I'll spare any introductions. >>> The memory map configurations I provided need to be calculated and >>> customized for the data import volume. >>> >>> Thanks, >>> >>> Kenny >>> >>> Sent from my iPhone >>> >>> On Nov 18, 2014, at 11:37 AM, José F. Morales Ph.D. <[email protected]> >>> wrote: >>> >>>> Kenny, >>>> >>>> In 3 hours it’ll be trying to load for 24 hours so this is not working. >>>> I’m catching shit from my crew too, so I got to fix this like soon. >>>> >>>> I haven’t done this before, but can I break up the data and load it in >>>> pieces? >>>> >>>> Jose >>>> >>>>> On Nov 17, 2014, at 3:35 PM, Kenny Bastani <[email protected]> wrote: >>>>> >>>>> Hey Jose, >>>>> >>>>> Try turning off the object cache. Add this line to the neo4j.properties >>>>> configuration file: >>>>> >>>>> cache_type=none >>>>> >>>>> Then retry your import. Also, enable memory mapped files by adding these >>>>> lines to the neo4j.properties file: >>>>> >>>>> neostore.nodestore.db.mapped_memory=2048M >>>>> neostore.relationshipstore.db.mapped_memory=4096M >>>>> neostore.propertystore.db.mapped_memory=200M >>>>> neostore.propertystore.db.strings.mapped_memory=500M >>>>> neostore.propertystore.db.arrays.mapped_memory=500M >>>>> >>>>> Thanks, >>>>> >>>>> Kenny >>>>> >>>>> >>>>> From: José F. Morales Ph.D. <[email protected]> >>>>> Sent: Monday, November 17, 2014 12:32 PM >>>>> To: Kenny Bastani >>>>> Subject: latest >>>>> >>>>> Hey Kenny, >>>>> >>>>> Here’s the deal. As I think I said, I loaded the 41 Mb file of cypher >>>>> code via the neo4j shell. Before I tried the LabCards file, I tried the >>>>> movies file and a UMLS database I made (8k relationships). They worked >>>>> fine. >>>>> >>>>> The LabCards file is taking a LONG time to load since I started at about >>>>> 9:30 - 10 PM last night and its 3PM now. >>>>> >>>>> I’ve wondered if its hung up and the activity monitor’s memory usage is >>>>> constant at two rows of Java at 4GB w/ the kernel at 1 GB. The CPU panel >>>>> changes alot so it looks like its doing its thing. >>>>> >>>>> So is this how are things to be expected? Do you think the loading is >>>>> gonna take a day or two? >>>>> >>>>> Jose >>>>> >>>>> >>>>> |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\|| >>>>> José F. Morales Ph.D. >>>>> Instructor >>>>> Cell Biology and Pathology >>>>> Columbia University Medical Center >>>>> [email protected] >>>>> 212-452-3351 >>>> >>>> |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\|| >>>> José F. Morales Ph.D. >>>> Instructor >>>> Cell Biology and Pathology >>>> Columbia University Medical Center >>>> [email protected] >>>> 212-452-3351 > > |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\|| > José F. Morales Ph.D. > Instructor > Cell Biology and Pathology > Columbia University Medical Center > [email protected] > 212-452-3351 > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
