OK Fellas,
What do you think of this?
Did this first...
auto-index LC_ID
Then this...
import-cypher -d , -i SAMPLE/Tz/Total_RELS_2.csv -b 1000 MATCH (LEFT_NODE {
LC_ID:{LEFT_NODE}}), (RIGHT_NODE {LC_ID:{RIGHT_NODE}}) CREATE
LEFT_NODE-[:#{REL}
{PHYLUM:#{PHYLUM},CAT:#{CAT},UI_RL:#{UI_RL},RESULT:#{RESULT},INT_TYPE:#{INT_TYPE},DEG:toINT(#{DEG}),SDS_TD:toFloat(#{SDS_TD}),Path_L_TD:toINT(#{Path_L_TD}),Path_S_TD:#{Path_S_TD}}]->RIGHT_NODE
return *
On Thursday, December 4, 2014 6:27:53 PM UTC-5, Michael Hunger wrote:
>
> Perhaps you should show the statement too? Not just the log output? :)
>
> use this: CREATE INDEX ON :{Label}(LC_ID); <- replace with your label(s)
>
> On Fri, Dec 5, 2014 at 12:09 AM, José F. Morales <[email protected]
> <javascript:>> wrote:
>
>> Andrii and Michael,
>>
>> Sorry for the delay in response. I was a little under the weather.
>> ANYHOW, it looks like I figured out how to do the data loading! I was
>> trying several approaches and the one using Michael's shell tools seems to
>> have worked! There were info from Andrii that proved important as well!
>> (my_node_ID as integer). The loading of the 18k NODES was in seconds. When
>> I tested the RELS with a tiny data set it worked perfectly. I am cleaning
>> up the 52k RELS file after the first attempt failed because of a missing "
>> ' ".
>>
>> My only issue is that the RELs loading is slow....
>>
>> commit after 1000 row(s) 0. 1%: nodes = 0 rels = 1000 properties = 7000
>> time 7059450 ms total 7059450 ms
>>
>> Now I thought that if I created an index (below), it would be faster.
>> Apparently not.
>>
>> neo4j-sh (?)$ auto-index LC_ID
>>
>> Enabling auto-indexing of Node properties: [LC_ID]
>>
>> Do I have this wrong? Should it have been CREATE INDEX ON :LC_ID?
>>
>> Jose
>>
>>
>> On Monday, December 1, 2014 5:09:36 PM UTC-5, Andrii Stesin wrote:
>>>
>>> Hi José,
>>>
>>> On Monday, December 1, 2014 12:33:58 AM UTC+2, José F. Morales wrote:
>>>>
>>>> Ok, but how many valid distinct combinations of your 10 node labels may
>>>>> exist?
>>>>>
>>>>
>>>> JFM: 264
>>>>
>>>
>>> This makes me think that maybe your target data model needs some
>>> refactoring. What are the entities (classes), and what can be better
>>> considered as attributes? Again, I'm not familiar with LabCard, so in case
>>> you give some explanations and a sample dataset which is publicly
>>> available, I'd take a close look at it.
>>>
>>>
>>>> JFM: Like I said, there are 264 unique combinations in all my nodes.
>>>>> Some are redundant, full spelling of a term/phrase and an abbreviation.
>>>>> Some are a code for a term/phrase. Some were created in anticipation of
>>>>> others values I would create later. I am trying to anticipate queries
>>>>> I'll
>>>>> make later.
>>>>>
>>>>
>>> Once again, I foresee a data modelling issue here.
>>>
>>>
>>>> JFM: Makes sense for speed. I guess it depends upon the size of one's
>>>>>> data.
>>>>>>
>>>>>
>>> Sure it does :)
>>>
>>>
>>>> Q3: “Skewer” is just an integer right? It corresponds in a way to
>>>>>> my_node_id
>>>>>>
>>>>>
>>>>> No, it's a label! so in Cypher your node (suppose it has 2 labels
>>>>> :LabelA and :LabelJ ) is described like
>>>>>
>>>>> MATCH (n:LabelA:LabelJ:Skewer {my_node_id: 123454, p1: 'something', p2
>>>>> : 'something else', p3: 'etc.'})
>>>>>
>>>>>
>>>> JFM: Got that!
>>>>
>>>> JFM: ok basic question... MATCH (n: <---What is "n"? Does it just
>>>> indicate that its a node of a particular class? What letter it is is
>>>> arbitrary right? Is there a name for what "n" is? For a while there, I
>>>> thought it was *my_node_ID. *
>>>>
>>>
>>> *n* is just a name of the variable. Cypher, like any other programming
>>> language, has a notion of "variable" which has it's name and which cat take
>>> different values; here I've choose *n* just occasionally for the
>>> variable name.
>>>
>>>
>>>> Q4: So does repeating the LOAD CSV with each file CLT_NODES_LabelA…J
>>>>>> combine the various labels and their respective values with their
>>>>>> corresponding nodes?
>>>>>>
>>>>>
>>>>> Label is not a variable, it does not have a value. It's just a label,
>>>>> consider "tag".
>>>>> Also *my_node_id* IS a variable so it does have a value.
>>>>>
>>>>
>>>> JFM: OK, I am not understanding this. I understood a "Label" as a
>>>> general category for a node.
>>>>
>>>
>>> That's Ok, or maybe even better is to imagine a tag. Node may have
>>> multiple tags (labels), they can be added and/or removed.
>>>
>>>
>>>> This was as opposed to a "Property" that was specific to a particular
>>>> node. As I understood it, a "Label" has different values.
>>>>
>>>
>>> Label is just a label. It doesn't have any value itself, it just marks
>>> (tags) some (sub)set of your nodes and allows you to distinguish between
>>> them. Labels may overlap. Consider automotive domain, and let's take a look
>>> for data model for it.
>>>
>>> Brand seems to better be modelled as a label. Say `Opel`, `Volvo` or
>>> `Peugeout`.
>>> Kind of vehicle is definitely(???) a label. Say `Truck`, `SUV`, `Car`.
>>> How to model some deeper things, depends on what you are going to
>>> achieve.
>>> Is body color a label or property? Which approach is better: either
>>>
>>> MATCH (vhcl:Truck:Volvo {body_color: 'red', VIN:
>>> 'VE18727673826812634X65' })
>>>
>>> or
>>>
>>> MATCH (vhcl:Opel:Yellow:SUV {VIN: 'VE18727673826812634X65'})
>>>
>>> ? I'm not sure, it depends on the goal, as for me I'd prefer color to be
>>> a property of some exact single car (once you can decide to paint your
>>> yellow car in white or some other color, after all)
>>>
>>> But VIN is *definitely* a property of one exact single car.
>>>
>>> Is car license plate a label or property? Definitely none of either,
>>> because you can sell your car and new owner will get another license plate
>>> for it, so I'd model this as
>>>
>>> MATCH (vhcl:Car:Ford {body_color: 'pink', VIN: 'FGT87356873HU8745'})-[:
>>> HAS_LICENSE_PLATE]->(lp:LicensePlate {state: 'AL', str: 'WH4TWR'})
>>>
>>>
>>> but as you see `LicensePlate` obviously should not be ever mixed with
>>> either `Car` or `Truck`, so they are different labels which do not
>>> intersect.
>>>
>>> So that Label could be "Category" and there could be two categories, for
>>>> example... CLT_SOURCE and CLT_TARGET . I thought that makes it like a
>>>> variable. If not, the label is all the same on a given set of nodes and
>>>> what's the point in that?
>>>>
>>>> JFM: OK, I get that *my_node_id *is a variable.
>>>>
>>>
>>> Agh, exactly.
>>>
>>>
>>>>
>>>>> 1. When doing LabelA .csv you will create whatever uniquely
>>>>> numbered nodes were not already in the database, fill their properties
>>>>> (or
>>>>> maybe overwrite them?) and label the node (be it new or existing one)
>>>>> with
>>>>> LabelA - no matter what other labels did node (possibly) have,
>>>>>
>>>>> JFM: OK. I get it.
>>>>
>>>>>
>>>>> 1. When doing LabelJ .csv you *again *will create whatever
>>>>> uniquely numbered nodes were not already in the database, *again*
>>>>> either
>>>>> fill or overwrite propertiers, and *again* label the node (be it
>>>>> new or existing one) with LabelJ - no matter what other labels did
>>>>> node
>>>>> (possibly) have,
>>>>>
>>>>> JFM: OK. I get it.
>>>>
>>>>>
>>>>> 1. so if you created some node with first file and labeled it
>>>>> LabelA, if the same unique *my_node_id *occur both in first and
>>>>> second files, your node will get 2 labels LabelA and LabelJ.
>>>>>
>>>>> JFM: That's wha tI want!!
>>>>
>>>
>>> Huh, Ok so far :)
>>>
>>>
>>>> Q5: Since I think of my data in terms of the two classes of nodes in my
>>>>>> Data model …[CLT_SOURCE —> CLT_TARGET ; CLT_TARGET —> CLT_SOURCE],
>>>>>> after
>>>>>> loading the nodes, how then I get two classes of nodes?
>>>>>>
>>>>>
>>>>> Make them 2 labels: CLTSource and CLTTarget respectively.
>>>>>
>>>>
>>>> JFM: OK. Regarding the labels...my csv file has a column called DESC
>>>> that has two values CLT_SOURCE and CLT_TARGET. You are saying that my
>>>> Source cvs should have a CLT_SOURCE column and my target csv
>>>> should have a CLT_TARGET column? My csv files should NOT a
>>>> configuration as I described?
>>>>
>>>
>>> What does CLT really mean in the real life? I failed to parse :( sorry
>>> for that. Once again, in case you describe the LabCard domain and provide
>>> me with a dataset, I'd be able to make you some better ideas (this also may
>>> become a good tutorial sample case for future Neo4j users).
>>>
>>>
>>>> JFM: Since my csv file has its A thru J columns A (2) values, B (1), C
>>>> (4) D (83), E (83), F (11) G (11) H (83) J (83), K (2), I should have ALOT
>>>> of csv files instead of just two for nodes!
>>>>
>>>
>>> Again, I strongly suspect a data modelling issue here.
>>>
>>>
>>>> JFM: What I am not getting from this is there is one csv file that has
>>>>>> the CLTSOURCE and CLTTARGET labels in it. That contradicts what I said
>>>>>> above because that would make only 1 csv file. I assume this there is
>>>>>> one
>>>>>> LOAD CSV statement and the my_node_ID:TOINT(csvline(0)}) and
>>>>>> my_node_ID:TOINT(csvline(1)}) refer presumably to two lines in that
>>>>>> file.
>>>>>>
>>>>>
>>> As soon as you have both src and target nodes already inside the
>>> database, you need a .csv file which describes only relationships in terms
>>> of 1st column contains src nodes ids, 2d column contains dst nodes ids and
>>> thus 1 row of .csv describes 1 single relationship per (linked) pair of
>>> nodes.
>>>
>>> For .csv with relationships, csvline[0] is a value of *my_node_id *property
>>>>>>> of the *source* node, csvline[1] is a value of *my_node_id *property
>>>>>>> of the *target* node, and TOINT() type conversion is used because
>>>>>>> my personal preference is to use integers for ids.
>>>>>>>
>>>>>>
>>>>>
>>>>>> Is it that ToInt(csvline[0]} refers to the a line of the REL.csv
>>>>>> file?
>>>>>>
>>>>>> Does csvline[0] refer to a column in REL.csv as do csvline[2] and
>>>>>> csvline[ZZ] (line 3) ?
>>>>>>
>>>>>
>>>>>
>>>> JFM: OK, I think I get it.
>>>>
>>>>
>>>>> I think you can combine import of multiple .CSV files in a single LOAD
>>>>> CSV statement but I didn't ever try this mode.
>>>>>
>>>>> WBR,
>>>>> Andrii
>>>>>
>>>>>
>>>>
>>>> JFM: Thanks!
>>>>
>>>
>>> :)
>>>
>>> WBR,
>>> Andrii
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.