[Neo4j] Re: large cypher statements

Michael Hunger Wed, 19 Nov 2014 05:04:58 -0800

José,

Let's continue the discussion on the google group


With larger I meant amount of data, not size of statements

As I also point out in various places we recommend creating only small 
subgraphs with a single statement separated by srmicolons.
Eg up to 100 nodes and rels

Gigantic statements just let the parser explode

I recommending splitting them up into statements creating subgraphs
Or create nodes and later match them by label & property to connect them
Make sure to have appropriate indexes / constraints

You should also surround blocks if statements with begin and commit commands

Von meinem iPhone gesendet

> Am 19.11.2014 um 04:18 schrieb José F. Morales Ph.D. <[email protected]>:
> 
> Hey Michael and Kenny
> 
> Thanks you guys a bunch for the help.
> 
> Let me give you a little background.  I am charged to make a prototype of a 
> tool (“LabCards”) that we hope to use in the hospital and beyond at some 
> point .  In preparation for making the main prototype, I made two prior Neo4j 
> databases that worked exactly as I wanted them to.  The first database was 
> built with NIH data and had 183 nodes and around 7500 relationships.  The 
> second database was the Pre-prototype and it had 1080 nodes and around 2000 
> relationships.  I created these in the form of cypher statements and either 
> pasted them in the Neo4j browser or used the neo4j shell and loaded them as 
> text files. Before doing that I checked the cypher code with Sublime Text 2 
> that highlights the code. Both databases loaded fine in both methods and did 
> what I wanted them to do.  
> 
> As you might imagine, the prototype is an expansion of the mini-prototype.  
> It has almost the same data model and I built it as a series of cypher 
> statements as well.  My first version of the prototype had ~60k nodes and 
> 160k relationships.  
> 
> I should say that a feature of this model is that all the source and target 
> nodes have relationships that point to each other.  No node points to itself 
> as far as I know. This file was 41 Mb of cypher code that I tried to load via 
> the neo4j shell.  
> 
> In fact, I was following your advise on loading big data files... “Use the 
> Neo4j-Shell for larger Imports”  
> (http://jexp.de/blog/2014/06/load-csv-into-neo4j-quickly-and-successfully/).  
>  This first time out, Java maxed out its memory allocated at 4Gb 2x and did 
> not complete loading in 24 hours.  I killed it. 
> 
> I then contacted Kenny, and he generously gave me some advice regarding the 
> properties file (below) and again the same deal (4 Gb Memory 2x) with Java 
> and no success in about 24 hours. I killed that one too.
> 
> Given my loading problems, I have subsequently eliminated a bunch 
> relationships (100k) so that the file is now 21 Mb. Alot of these were 
> duplicates that I didn’t pick up before and am trying it again.  So far 15 
> min into it, similar situation.  The difference is that Java is using 1.7 and 
> 0.5 GB of memory
> 
> Here is the cypher for a typical node…
> 
> CREATE ( CLT_1:`CLT SOURCE`:BIOMEDICAL:TEST_NAME:`Laboratory 
> Procedure`:lbpr:`Procedures`:PROC:T059:`B1.3.1.1`:TZ{NAME:'Acetoacetate 
> (ketone body)',SYNONYM:'',Sample:'SERUM, URINE',MEDCODE:10010,CUI:'NA’})
> 
> Here is the cypher for a typical relationship...
> 
> CREATE(CLT_1)-[:MEASUREMENT_OF{Phylum:'TZ',CAT:'TEST.NAME',Ui_Rl:'T157',RESULT:'',Type:'',Semantic_Distance_Score:'NA',Path_Length:'NA',Path_Steps:'NA'}]->(CLT_TARGET_3617),
> 
> I will let you know how this one turns out.  I hope this is helpful.
> 
> Many, many thanks fellas!!!
> 
> Jose
> 
>> On Nov 18, 2014, at 8:33 PM, Michael Hunger 
>> <[email protected]> wrote:
>> 
>> Hi José,
>> 
>> can you provide perhaps more detail about your dataset (e.g. sample of the 
>> csv, size, etc. perhaps an output of csvstat (of csvkit) would be helpful), 
>> your cypher queries to load it
>> 
>> Have you seen my other blog post, which explains two big caveats that people 
>> run into when trying this? jexp.de/blog/2014/10/load-cvs-with-success/
>> 
>> Cheers, Michael
>> 
>>> On Tue, Nov 18, 2014 at 8:43 PM, Kenny Bastani <[email protected]> wrote:
>>> Hey Jose,
>>> 
>>> There is definitely an answer. Let me put you in touch with the data import 
>>> master: Michael Hunger.
>>> 
>>> Michael, I think the answers here will be pretty straight forward for you. 
>>> You met Jose at GraphConnect NY last year, so I'll spare any introductions. 
>>> The memory map configurations I provided need to be calculated and 
>>> customized for the data import volume.
>>> 
>>> Thanks,
>>> 
>>> Kenny
>>> 
>>> Sent from my iPhone
>>> 
>>> On Nov 18, 2014, at 11:37 AM, José F. Morales Ph.D. <[email protected]> 
>>> wrote:
>>> 
>>>> Kenny,  
>>>> 
>>>> In 3 hours it’ll be trying to load for 24 hours so this is not working.  
>>>> I’m catching shit from my crew too, so I got to fix this like soon.
>>>> 
>>>> I haven’t done this before, but can I break up the data and load it in 
>>>> pieces?
>>>> 
>>>> Jose
>>>> 
>>>>> On Nov 17, 2014, at 3:35 PM, Kenny Bastani <[email protected]> wrote:
>>>>> 
>>>>> Hey Jose,
>>>>> 
>>>>> Try turning off the object cache. Add this line to the neo4j.properties 
>>>>> configuration file:
>>>>> 
>>>>> cache_type=none
>>>>> 
>>>>> Then retry your import. Also, enable memory mapped files by adding these 
>>>>> lines to the neo4j.properties file:
>>>>> 
>>>>> neostore.nodestore.db.mapped_memory=2048M
>>>>> neostore.relationshipstore.db.mapped_memory=4096M
>>>>> neostore.propertystore.db.mapped_memory=200M
>>>>> neostore.propertystore.db.strings.mapped_memory=500M
>>>>> neostore.propertystore.db.arrays.mapped_memory=500M
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Kenny
>>>>> 
>>>>>  
>>>>> From: José F. Morales Ph.D. <[email protected]>
>>>>> Sent: Monday, November 17, 2014 12:32 PM
>>>>> To: Kenny Bastani
>>>>> Subject: latest
>>>>>  
>>>>> Hey Kenny,
>>>>> 
>>>>> Here’s the deal. As I think I said, I loaded the 41 Mb file of cypher 
>>>>> code via the neo4j shell. Before I tried the LabCards file, I tried the 
>>>>> movies file and a UMLS database I made (8k relationships).  They worked 
>>>>> fine. 
>>>>> 
>>>>> The LabCards file is taking a LONG time to load since I started at about 
>>>>> 9:30 - 10 PM last night and its 3PM now.  
>>>>> 
>>>>> I’ve wondered if its hung up and the activity monitor’s memory usage is 
>>>>> constant at two rows of Java at 4GB w/ the kernel at 1 GB.  The CPU panel 
>>>>> changes alot so it looks like its doing its thing. 
>>>>> 
>>>>> So is this how are things to be expected?  Do you think the loading is 
>>>>> gonna take a day or two?  
>>>>> 
>>>>> Jose
>>>>> 
>>>>> 
>>>>> |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\||
>>>>> José F. Morales Ph.D.
>>>>> Instructor
>>>>> Cell Biology and Pathology
>>>>> Columbia University Medical Center
>>>>> [email protected]
>>>>> 212-452-3351
>>>> 
>>>> |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\||
>>>> José F. Morales Ph.D.
>>>> Instructor
>>>> Cell Biology and Pathology
>>>> Columbia University Medical Center
>>>> [email protected]
>>>> 212-452-3351
> 
> |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\||
> José F. Morales Ph.D.
> Instructor
> Cell Biology and Pathology
> Columbia University Medical Center
> [email protected]
> 212-452-3351
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[Neo4j] Re: large cypher statements

Reply via email to