OK, Gents, I will implement your very gracious suggestions and let you know what happens.
Thanks a load, Jose On Wednesday, November 19, 2014 8:04:44 AM UTC-5, Michael Hunger wrote: > > José, > > Let's continue the discussion on the google group > > With larger I meant amount of data, not size of statements > > As I also point out in various places we recommend creating only small > subgraphs with a single statement separated by srmicolons. > Eg up to 100 nodes and rels > > Gigantic statements just let the parser explode > > I recommending splitting them up into statements creating subgraphs > Or create nodes and later match them by label & property to connect them > Make sure to have appropriate indexes / constraints > > You should also surround blocks if statements with begin and commit > commands > > Von meinem iPhone gesendet > > Am 19.11.2014 um 04:18 schrieb José F. Morales Ph.D. <[email protected] > <javascript:>>: > > Hey Michael and Kenny > > Thanks you guys a bunch for the help. > > Let me give you a little background. I am charged to make a prototype of > a tool (“LabCards”) that we hope to use in the hospital and beyond at some > point . In preparation for making the main prototype, I made two prior > Neo4j databases that worked exactly as I wanted them to. The first > database was built with NIH data and had 183 nodes and around 7500 > relationships. The second database was the Pre-prototype and it had 1080 > nodes and around 2000 relationships. I created these in the form of cypher > statements and either pasted them in the Neo4j browser or used the neo4j > shell and loaded them as text files. Before doing that I checked the cypher > code with Sublime Text 2 that highlights the code. Both databases loaded > fine in both methods and did what I wanted them to do. > > As you might imagine, the prototype is an expansion of the mini-prototype. > It has almost the same data model and I built it as a series of cypher > statements as well. My first version of the prototype had ~60k nodes and > 160k relationships. > > I should say that a feature of this model is that all the source and > target nodes have relationships that point to each other. No node points > to itself as far as I know. This file was 41 Mb of cypher code that I tried > to load via the neo4j shell. > > In fact, I was following your advise on loading big data files... “Use the > Neo4j-Shell for larger Imports” ( > http://jexp.de/blog/2014/06/load-csv-into-neo4j-quickly-and-successfully/). > This first time out, Java maxed out its memory allocated at 4Gb 2x and > did not complete loading in 24 hours. I killed it. > > I then contacted Kenny, and he generously gave me some advice regarding > the properties file (below) and again the same deal (4 Gb Memory 2x) with > Java and no success in about 24 hours. I killed that one too. > > Given my loading problems, I have subsequently eliminated a bunch > relationships (100k) so that the file is now 21 Mb. Alot of these were > duplicates that I didn’t pick up before and am trying it again. So far 15 > min into it, similar situation. The difference is that Java is using 1.7 > and 0.5 GB of memory > > Here is the cypher for a typical node… > > CREATE ( CLT_1:`CLT SOURCE`:BIOMEDICAL:TEST_NAME:`Laboratory > Procedure`:lbpr:`Procedures`:PROC:T059:`B1.3.1.1`:TZ{NAME:'Acetoacetate > (ketone body)',SYNONYM:'',Sample:'SERUM, URINE',MEDCODE:10010,CUI:'NA’}) > > Here is the cypher for a typical relationship... > > CREATE(CLT_1)-[:MEASUREMENT_OF{Phylum:'TZ',CAT:'TEST.NAME > ',Ui_Rl:'T157',RESULT:'',Type:'',Semantic_Distance_Score:'NA',Path_Length:'NA',Path_Steps:'NA'}]->(CLT_TARGET_3617), > > I will let you know how this one turns out. I hope this is helpful. > > Many, many thanks fellas!!! > > Jose > > On Nov 18, 2014, at 8:33 PM, Michael Hunger <[email protected] > <javascript:>> wrote: > > Hi José, > > can you provide perhaps more detail about your dataset (e.g. sample of the > csv, size, etc. perhaps an output of csvstat (of csvkit) would be helpful), > your cypher queries to load it > > Have you seen my other blog post, which explains two big caveats that > people run into when trying this? > jexp.de/blog/2014/10/load-cvs-with-success/ > > Cheers, Michael > > On Tue, Nov 18, 2014 at 8:43 PM, Kenny Bastani <[email protected] > <javascript:>> wrote: > >> Hey Jose, >> >> There is definitely an answer. Let me put you in touch with the data >> import master: Michael Hunger. >> >> Michael, I think the answers here will be pretty straight forward for >> you. You met Jose at GraphConnect NY last year, so I'll spare any >> introductions. The memory map configurations I provided need to be >> calculated and customized for the data import volume. >> >> Thanks, >> >> Kenny >> >> Sent from my iPhone >> >> On Nov 18, 2014, at 11:37 AM, José F. Morales Ph.D. <[email protected] >> <javascript:>> wrote: >> >> Kenny, >> >> In 3 hours it’ll be trying to load for 24 hours so this is not >> working. I’m catching shit from my crew too, so I got to fix this like >> soon. >> >> I haven’t done this before, but can I break up the data and load it in >> pieces? >> >> Jose >> >> On Nov 17, 2014, at 3:35 PM, Kenny Bastani <[email protected] >> <javascript:>> wrote: >> >> Hey Jose, >> >> Try turning off the object cache. Add this line to the neo4j.properties >> configuration file: >> >> cache_type=none >> >> Then retry your import. Also, enable memory mapped files by adding these >> lines to the neo4j.properties file: >> >> neostore.nodestore.db.mapped_memory=2048M >> neostore.relationshipstore.db.mapped_memory=4096M >> neostore.propertystore.db.mapped_memory=200M >> neostore.propertystore.db.strings.mapped_memory=500M >> neostore.propertystore.db.arrays.mapped_memory=500M >> >> Thanks, >> >> Kenny >> >> ------------------------------ >> *From:* José F. Morales Ph.D. <[email protected] <javascript:>> >> *Sent:* Monday, November 17, 2014 12:32 PM >> *To:* Kenny Bastani >> *Subject:* latest >> >> Hey Kenny, >> >> Here’s the deal. As I think I said, I loaded the 41 Mb file of cypher >> code via the neo4j shell. Before I tried the LabCards file, I tried the >> movies file and a UMLS database I made (8k relationships). They worked >> fine. >> >> The LabCards file is taking a LONG time to load since I started at >> about 9:30 - 10 PM last night and its 3PM now. >> >> I’ve wondered if its hung up and the activity monitor’s memory usage is >> constant at two rows of Java at 4GB w/ the kernel at 1 GB. The CPU panel >> changes alot so it looks like its doing its thing. >> >> So is this how are things to be expected? Do you think the loading is >> gonna take a day or two? >> >> Jose >> >> >> |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\|| >> José F. Morales Ph.D. >> Instructor >> Cell Biology and Pathology >> Columbia University Medical Center >> [email protected] <javascript:> >> 212-452-3351 >> >> >> |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\|| >> José F. Morales Ph.D. >> Instructor >> Cell Biology and Pathology >> Columbia University Medical Center >> [email protected] <javascript:> >> 212-452-3351 >> >> > > |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\|| > José F. Morales Ph.D. > Instructor > Cell Biology and Pathology > Columbia University Medical Center > [email protected] <javascript:> > 212-452-3351 > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
