Make sure to have a look at my blog posts:
1. elaborating the individual cypher commands:
if you create nodes and then later match them to connect:
// created indexes
create index on :Movie(title);
create index on :Person(name);
// or alternatively unique constraints
create constraint on m:Movie assert m.title is unique;
create constraint on p:Person assert p.name is unique;
begin
create (:Movie {title:"The Matrix", ...});
create (:Person {name:"Keanu Reeves", ...});
....
// match + create rel
match (m:Movie {title:"The Matrix"}), (p:Person {name:"Keanu
Reeves"}) create (p)-[:ACTED_IN {role:"Neo"}]->(m);
...
commit
2. load csv
http://jexp.de/blog/2014/10/load-cvs-with-success/
On Wed, Nov 19, 2014 at 8:36 PM, José F. Morales <[email protected]> wrote:
> OK Fellas,
>
> A you might imagine the last effort I made didn't work either even though
> I cut down the relationships a lot. Same maxing out Java at 4 GB and not
> doing anything for 12 + hours.
>
> OK, so here is my understanding of the Approaches and my likely course of
> action. Some aspects you guys cite I’m not familiar with…particularly
> indexes and constraints.Never used them before. I’m going to look for
> examples that can give me an idea of how to do them.
>
> Approaches:
>
> 1. Approach 1:
> a. “Creating only small subgraphs with a single statement separated by
> semicolons. Eg up to 100 nodes and rels”
> b. surround blocks of statements begin and commit commands
> c. I am assuming that this approach involves cypher statements uploaded
> via the neo4j shell
>
> I am assuming that the format you are referring to is similar to what was
> used in the movies db. There a few nodes were created, then the
> relationships that used them were created and so on. Since I used the
> “movies” DB as my model, I did not use the “begin” and “commit” commands in
> my previous code. They seemed to work find and I didn’t know I needed
> them. I will look up how to use them. However, this means making sure the
> nodes and relationships are in the proper order. That’ll take a little
> work.
>
> 2. Approach 2
> a. “…create nodes and later match them by label & property to connect them”
> b. surround blocks of statements begin and commit commands
> c. I am assuming that this approach involves cypher statements uploaded
> via the neo4j shell
>
> I am not sure exactly what you mean here in terms of “…match them by label
> & property to connect them”
>
> 3. CSV approach
> a. “Dump the base into 2 .csv files:”
> b. CSV1: “Describe nodes (enumerate them via some my_node_id integer
> attribute), columns: my_node_id,label,node_prop_01,node_prop_ZZ”
> c. CSV2: “Describe relations, columns: source_my_node_id,
> dest_my_node_id,rel_type,rel_prop_01,...,rel_prop_NN”
> d. Indexes constraints: before starting import —> have appropriate
> indexes / constraints
> e. via LOAD CSV, import CSV1, then CSV2.
> f. Import no more than 10,000-30,000 lines in a single LOAD CSV statement
>
> This seems to be a very well elaborated method and the easiest for me to
> do. I have files such that I can create these without too much problem. I
> figure I’ll split the nodes into three files 20k rows each. I can do the
> same with the Rels. I have not used indexes or constraints yet in the db’s
> that I already created and as I said above, I’ll have to see how to use
> them.
>
> I am assuming column headers that fit with my data are consistent with
> what you explained below (Like, I can put my own meaningful text into Label
> 1 -10 and node_prop_01 - 05)....
> my_node_id, label1, label2, label3, label4,
> label5, label6, label7, label8, label9,
> label10, node_prop_01, node_prop_02, node_prop_03,
> node_prop_04, node_prop_ZZ”
>
> Thanks again Fellas!!
>
> Jose
>
>
> On Wednesday, November 19, 2014 8:04:44 AM UTC-5, Michael Hunger wrote:
>>
>> José,
>>
>> Let's continue the discussion on the google group
>>
>> With larger I meant amount of data, not size of statements
>>
>> As I also point out in various places we recommend creating only small
>> subgraphs with a single statement separated by srmicolons.
>> Eg up to 100 nodes and rels
>>
>> Gigantic statements just let the parser explode
>>
>> I recommending splitting them up into statements creating subgraphs
>> Or create nodes and later match them by label & property to connect them
>> Make sure to have appropriate indexes / constraints
>>
>> You should also surround blocks if statements with begin and commit
>> commands
>>
>> Von meinem iPhone gesendet
>>
>> Am 19.11.2014 um 04:18 schrieb José F. Morales Ph.D. <[email protected]
>> >:
>>
>> Hey Michael and Kenny
>>
>> Thanks you guys a bunch for the help.
>>
>> Let me give you a little background. I am charged to make a prototype of
>> a tool (“LabCards”) that we hope to use in the hospital and beyond at some
>> point . In preparation for making the main prototype, I made two prior
>> Neo4j databases that worked exactly as I wanted them to. The first
>> database was built with NIH data and had 183 nodes and around 7500
>> relationships. The second database was the Pre-prototype and it had 1080
>> nodes and around 2000 relationships. I created these in the form of cypher
>> statements and either pasted them in the Neo4j browser or used the neo4j
>> shell and loaded them as text files. Before doing that I checked the cypher
>> code with Sublime Text 2 that highlights the code. Both databases loaded
>> fine in both methods and did what I wanted them to do.
>>
>> As you might imagine, the prototype is an expansion of the
>> mini-prototype. It has almost the same data model and I built it as a
>> series of cypher statements as well. My first version of the prototype had
>> ~60k nodes and 160k relationships.
>>
>> I should say that a feature of this model is that all the source and
>> target nodes have relationships that point to each other. No node points
>> to itself as far as I know. This file was 41 Mb of cypher code that I tried
>> to load via the neo4j shell.
>>
>> In fact, I was following your advise on loading big data files... “Use
>> the Neo4j-Shell for larger Imports” (http://jexp.de/blog/2014/06/
>> load-csv-into-neo4j-quickly-and-successfully/). This first time out,
>> Java maxed out its memory allocated at 4Gb 2x and did not complete loading
>> in 24 hours. I killed it.
>>
>> I then contacted Kenny, and he generously gave me some advice regarding
>> the properties file (below) and again the same deal (4 Gb Memory 2x) with
>> Java and no success in about 24 hours. I killed that one too.
>>
>> Given my loading problems, I have subsequently eliminated a bunch
>> relationships (100k) so that the file is now 21 Mb. Alot of these were
>> duplicates that I didn’t pick up before and am trying it again. So far 15
>> min into it, similar situation. The difference is that Java is using 1.7
>> and 0.5 GB of memory
>>
>> Here is the cypher for a typical node…
>>
>> CREATE ( CLT_1:`CLT SOURCE`:BIOMEDICAL:TEST_NAME:`Laboratory
>> Procedure`:lbpr:`Procedures`:PROC:T059:`B1.3.1.1`:TZ{NAME:'Acetoacetate
>> (ketone body)',SYNONYM:'',Sample:'SERUM, URINE',MEDCODE:10010,CUI:'NA’})
>>
>> Here is the cypher for a typical relationship...
>>
>> CREATE(CLT_1)-[:MEASUREMENT_OF{Phylum:'TZ',CAT:'TEST.NAME'
>> ,Ui_Rl:'T157',RESULT:'',Type:'',Semantic_Distance_Score:'NA'
>> ,Path_Length:'NA',Path_Steps:'NA'}]->(CLT_TARGET_3617),
>>
>> I will let you know how this one turns out. I hope this is helpful.
>>
>> Many, many thanks fellas!!!
>>
>> Jose
>>
>> On Nov 18, 2014, at 8:33 PM, Michael Hunger <[email protected]>
>> wrote:
>>
>> Hi José,
>>
>> can you provide perhaps more detail about your dataset (e.g. sample of
>> the csv, size, etc. perhaps an output of csvstat (of csvkit) would be
>> helpful), your cypher queries to load it
>>
>> Have you seen my other blog post, which explains two big caveats that
>> people run into when trying this? jexp.de/blog/2014/10/
>> load-cvs-with-success/
>>
>> Cheers, Michael
>>
>> On Tue, Nov 18, 2014 at 8:43 PM, Kenny Bastani <[email protected]>
>> wrote:
>>
>>> Hey Jose,
>>>
>>> There is definitely an answer. Let me put you in touch with the data
>>> import master: Michael Hunger.
>>>
>>> Michael, I think the answers here will be pretty straight forward for
>>> you. You met Jose at GraphConnect NY last year, so I'll spare any
>>> introductions. The memory map configurations I provided need to be
>>> calculated and customized for the data import volume.
>>>
>>> Thanks,
>>>
>>> Kenny
>>>
>>> Sent from my iPhone
>>>
>>> On Nov 18, 2014, at 11:37 AM, José F. Morales Ph.D. <[email protected]>
>>> wrote:
>>>
>>> Kenny,
>>>
>>> In 3 hours it’ll be trying to load for 24 hours so this is not
>>> working. I’m catching shit from my crew too, so I got to fix this like
>>> soon.
>>>
>>> I haven’t done this before, but can I break up the data and load it in
>>> pieces?
>>>
>>> Jose
>>>
>>> On Nov 17, 2014, at 3:35 PM, Kenny Bastani <[email protected]> wrote:
>>>
>>> Hey Jose,
>>>
>>> Try turning off the object cache. Add this line to the
>>> neo4j.properties configuration file:
>>>
>>> cache_type=none
>>>
>>> Then retry your import. Also, enable memory mapped files by adding these
>>> lines to the neo4j.properties file:
>>>
>>> neostore.nodestore.db.mapped_memory=2048M
>>> neostore.relationshipstore.db.mapped_memory=4096M
>>> neostore.propertystore.db.mapped_memory=200M
>>> neostore.propertystore.db.strings.mapped_memory=500M
>>> neostore.propertystore.db.arrays.mapped_memory=500M
>>>
>>> Thanks,
>>>
>>> Kenny
>>>
>>> ------------------------------
>>> *From:* José F. Morales Ph.D. <[email protected]>
>>> *Sent:* Monday, November 17, 2014 12:32 PM
>>> *To:* Kenny Bastani
>>> *Subject:* latest
>>>
>>> Hey Kenny,
>>>
>>> Here’s the deal. As I think I said, I loaded the 41 Mb file of cypher
>>> code via the neo4j shell. Before I tried the LabCards file, I tried the
>>> movies file and a UMLS database I made (8k relationships). They worked
>>> fine.
>>>
>>> The LabCards file is taking a LONG time to load since I started at
>>> about 9:30 - 10 PM last night and its 3PM now.
>>>
>>> I’ve wondered if its hung up and the activity monitor’s memory usage
>>> is constant at two rows of Java at 4GB w/ the kernel at 1 GB. The CPU
>>> panel changes alot so it looks like its doing its thing.
>>>
>>> So is this how are things to be expected? Do you think the loading is
>>> gonna take a day or two?
>>>
>>> Jose
>>>
>>>
>>> |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\||
>>> José F. Morales Ph.D.
>>> Instructor
>>> Cell Biology and Pathology
>>> Columbia University Medical Center
>>> [email protected]
>>> 212-452-3351
>>>
>>>
>>> |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\||
>>> José F. Morales Ph.D.
>>> Instructor
>>> Cell Biology and Pathology
>>> Columbia University Medical Center
>>> [email protected]
>>> 212-452-3351
>>>
>>>
>>
>> |//.\\||//.\\|||//.\\||//.\\|||//.\\||//.\\||
>> José F. Morales Ph.D.
>> Instructor
>> Cell Biology and Pathology
>> Columbia University Medical Center
>> [email protected]
>> 212-452-3351
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.