[Neo4j] Re: large cypher statements

José F . Morales Thu, 20 Nov 2014 05:21:00 -0800

OK, Gents,

I will implement your very gracious suggestions and let you know what 
happens.


Thanks a load,

Jose

On Wednesday, November 19, 2014 8:04:44 AM UTC-5, Michael Hunger wrote:
>
> José,
>
> Let's continue the discussion on the google group
>
> With larger I meant amount of data, not size of statements
>
> As I also point out in various places we recommend creating only small 
> subgraphs with a single statement separated by srmicolons.
> Eg up to 100 nodes and rels
>
> Gigantic statements just let the parser explode
>
> I recommending splitting them up into statements creating subgraphs
> Or create nodes and later match them by label & property to connect them
> Make sure to have appropriate indexes / constraints
>
> You should also surround blocks if statements with begin and commit 
> commands
>
> Von meinem iPhone gesendet
>
> Am 19.11.2014 um 04:18 schrieb José F. Morales Ph.D. <[email protected] 
> <javascript:>>:
>
> Hey Michael and Kenny
>
> Thanks you guys a bunch for the help.
>
> Let me give you a little background.  I am charged to make a prototype of 
> a tool (“LabCards”) that we hope to use in the hospital and beyond at some 
> point .  In preparation for making the main prototype, I made two prior 
> Neo4j databases that worked exactly as I wanted them to.  The first 
> database was built with NIH data and had 183 nodes and around 7500 
> relationships.  The second database was the Pre-prototype and it had 1080 
> nodes and around 2000 relationships.  I created these in the form of cypher 
> statements and either pasted them in the Neo4j browser or used the neo4j 
> shell and loaded them as text files. Before doing that I checked the cypher 
> code with Sublime Text 2 that highlights the code. Both databases loaded 
> fine in both methods and did what I wanted them to do.  
>
> As you might imagine, the prototype is an expansion of the mini-prototype. 
>  It has almost the same data model and I built it as a series of cypher 
> statements as well.  My first version of the prototype had ~60k nodes and 
> 160k relationships.  
>
> I should say that a feature of this model is that all the source and 
> target nodes have relationships that point to each other.  No node points 
> to itself as far as I know. This file was 41 Mb of cypher code that I tried 
> to load via the neo4j shell.  
>
> In fact, I was following your advise on loading big data files... “Use the 
> Neo4j-Shell for larger Imports”  (
> http://jexp.de/blog/2014/06/load-csv-into-neo4j-quickly-and-successfully/). 
>   This first time out, Java maxed out its memory allocated at 4Gb 2x and 
> did not complete loading in 24 hours.  I killed it. 
>
> I then contacted Kenny, and he generously gave me some advice regarding 
> the properties file (below) and again the same deal (4 Gb Memory 2x) with 
> Java and no success in about 24 hours. I killed that one too.
>
> Given my loading problems, I have subsequently eliminated a bunch 
> relationships (100k) so that the file is now 21 Mb. Alot of these were 
> duplicates that I didn’t pick up before and am trying it again.  So far 15 
> min into it, similar situation.  The difference is that Java is using 1.7 
> and 0.5 GB of memory
>
> Here is the cypher for a typical node…
>
> CREATE ( CLT_1:`CLT SOURCE`:BIOMEDICAL:TEST_NAME:`Laboratory 
> Procedure`:lbpr:`Procedures`:PROC:T059:`B1.3.1.1`:TZ{NAME:'Acetoacetate 
> (ketone body)',SYNONYM:'',Sample:'SERUM, URINE',MEDCODE:10010,CUI:'NA’})
>
> Here is the cypher for a typical relationship...
>
> CREATE(CLT_1)-[:MEASUREMENT_OF{Phylum:'TZ',CAT:'TEST.NAME
> ',Ui_Rl:'T157',RESULT:'',Type:'',Semantic_Distance_Score:'NA',Path_Length:'NA',Path_Steps:'NA'}]->(CLT_TARGET_3617),
>
> I will let you know how this one turns out.  I hope this is helpful.
>
> Many, many thanks fellas!!!
>
> Jose
>
> On Nov 18, 2014, at 8:33 PM, Michael Hunger <[email protected] 
> <javascript:>> wrote:
>
> Hi José,
>
> can you provide perhaps more detail about your dataset (e.g. sample of the 
> csv, size, etc. perhaps an output of csvstat (of csvkit) would be helpful), 
> your cypher queries to load it
>
> Have you seen my other blog post, which explains two big caveats that 
> people run into when trying this? 
> jexp.de/blog/2014/10/load-cvs-with-success/
>
> Cheers, Michael
>
> On Tue, Nov 18, 2014 at 8:43 PM, Kenny Bastani <[email protected] 
> <javascript:>> wrote:
>
>>  Hey Jose,
>>
>>  There is definitely an answer. Let me put you in touch with the data 
>> import master: Michael Hunger.
>>
>>  Michael, I think the answers here will be pretty straight forward for 
>> you. You met Jose at GraphConnect NY last year, so I'll spare any 
>> introductions. The memory map configurations I provided need to be 
>> calculated and customized for the data import volume.
>>
>>  Thanks,
>>
>>  Kenny
>>
>> Sent from my iPhone
>>
>> On Nov 18, 2014, at 11:37 AM, José F. Morales Ph.D. <[email protected] 
>> <javascript:>> wrote:
>>
>>   Kenny,  
>>
>>  In 3 hours it’ll be trying to load for 24 hours so this is not 
>> working.  I’m catching shit from my crew too, so I got to fix this like 
>> soon.
>>
>>  I haven’t done this before, but can I break up the data and load it in 
>> pieces?
>>
>>  Jose
>>
>>  On Nov 17, 2014, at 3:35 PM, Kenny Bastani <[email protected] 
>> <javascript:>> wrote:
>>
>>  Hey Jose,
>>
>>  Try turning off the object cache. Add this line to the neo4j.properties 
>> configuration file:
>>
>>  cache_type=none
>>
>> Then retry your import. Also, enable memory mapped files by adding these 
>> lines to the neo4j.properties file:
>>
>>  neostore.nodestore.db.mapped_memory=2048M
>> neostore.relationshipstore.db.mapped_memory=4096M
>> neostore.propertystore.db.mapped_memory=200M
>> neostore.propertystore.db.strings.mapped_memory=500M
>> neostore.propertystore.db.arrays.mapped_memory=500M
>>  
>>  Thanks,
>>
>>  Kenny
>>  
>>  ------------------------------
>> *From:* José F. Morales Ph.D. <[email protected] <javascript:>>
>> *Sent:* Monday, November 17, 2014 12:32 PM
>> *To:* Kenny Bastani
>> *Subject:* latest 
>>  
>>   Hey Kenny,
>>
>>  Here’s the deal. As I think I said, I loaded the 41 Mb file of cypher 
>> code via the neo4j shell. Before I tried the LabCards file, I tried the 
>> movies file and a UMLS database I made (8k relationships).  They worked 
>> fine. 
>>
>>  The LabCards file is taking a LONG time to load since I started at 
>> about 9:30 - 10 PM last night and its 3PM now.  
>>
>>  I’ve wondered if its hung up and the activity monitor’s memory usage is 
>> constant at two rows of Java at 4GB w/ the kernel at 1 GB.  The CPU panel 
>> changes alot so it looks like its doing its thing. 
>>
>>  So is this how are things to be expected?  Do you think the loading is 
>> gonna take a day or two?  
>>
>>  Jose
>>  
>>  
>>    |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\||
>> José F. Morales Ph.D.
>>  Instructor
>>  Cell Biology and Pathology
>> Columbia University Medical Center
>>  [email protected] <javascript:>
>>  212-452-3351
>>     
>>  
>>    |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\||
>> José F. Morales Ph.D.
>>  Instructor
>>  Cell Biology and Pathology
>> Columbia University Medical Center
>>  [email protected] <javascript:>
>>  212-452-3351
>>   
>>   
>
> |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\||
> José F. Morales Ph.D.
> Instructor
> Cell Biology and Pathology
> Columbia University Medical Center
> [email protected] <javascript:>
> 212-452-3351
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[Neo4j] Re: large cypher statements

Reply via email to