Re: [Neo4j] Re: large cypher statements

José F . Morales Thu, 04 Dec 2014 17:29:39 -0800

Some questions...

On Thursday, December 4, 2014 7:00:16 PM UTC-5, Michael Hunger wrote:
>
> No, not at all
> auto-index is for legacy indexes
> do the create index that I said
>
> Got it.


> in your MATCH you _must_ provide the label then.
>
> MATCH (LEFT_NODE:LABEL1 {LC_ID:{LEFT_NODE}}), (RIGHT_NODE:LABEL2 {LC_ID:{
> RIGHT_NODE}})
> ..
>

I have a node label whose header in the csv is called DESC with two values 
that for brevity are ... s and  t .  
Do you mean I should write...  MATCH (LEFT_NODE:s {LC_ID:{LEFT_NODE}}), (
RIGHT_NODE:t {LC_ID:{RIGHT_NODE}}) OR MATCH (LEFT_NODE:DESC {LC_ID:{
LEFT_NODE}}), (RIGHT_NODE:DESC {LC_ID:{RIGHT_NODE}})?
 

>
> You should also _never_ use #{} expressions for values, *only* for labels 
> and rel-types.
> Only use Cypher parameters: {CAT}.
>
> Yes, got it. It worked.
 

> I also saw that you have a ton of relationship-properties. do you think 
> you need them all?
>

I could live without 3 of the 9.  5 are essential. 1 is maybe.  Could be 
useful.
 

> Perhaps there is also a Node / Entity actually hiding in your 
> relationships?
>

I am quite sure that my data model can be improved.  But I wanted to have a 
really simple one to start. Time is a factor now.   Those 5 are very much 
qualities of the relationship.  One of those properties applies only to one 
two types of relationships.  
 

> Michael
>
> On Fri, Dec 5, 2014 at 12:54 AM, José F. Morales <[email protected] 
> <javascript:>> wrote:
>
>> OK Fellas,  
>>
>> What do you think of this?
>>
>> Did this first...
>>
>> auto-index LC_ID
>>
>> Then this...
>>
>> import-cypher -d , -i SAMPLE/Tz/Total_RELS_2.csv -b 1000  MATCH (LEFT_NODE 
>> {LC_ID:{LEFT_NODE}}), (RIGHT_NODE {LC_ID:{RIGHT_NODE}}) CREATE LEFT_NODE
>> -[:#{REL} 
>> {PHYLUM:#{PHYLUM},CAT:#{CAT},UI_RL:#{UI_RL},RESULT:#{RESULT},INT_TYPE:#{INT_TYPE},DEG:toINT(#{DEG}),SDS_TD:toFloat(#{SDS_TD}),Path_L_TD:toINT(#{Path_L_TD}),Path_S_TD:#{Path_S_TD}}]->RIGHT_NODE
>>  
>> return *
>>
>>
>>
>>
>>
>>
>> On Thursday, December 4, 2014 6:27:53 PM UTC-5, Michael Hunger wrote:
>>>
>>> Perhaps you should show the statement too? Not just the log output? :)
>>>
>>> use this: CREATE INDEX ON :{Label}(LC_ID); <- replace with your label(s)
>>>
>>> On Fri, Dec 5, 2014 at 12:09 AM, José F. Morales <[email protected]> 
>>> wrote:
>>>
>>>> Andrii and Michael,
>>>>
>>>> Sorry for the delay in response. I was a little under the weather.   
>>>> ANYHOW, it looks like I figured out how to do the data loading! I was 
>>>> trying several approaches and the one using Michael's shell tools seems to 
>>>> have worked! There were info from Andrii that proved important as well! 
>>>> (my_node_ID as integer).  The loading of the 18k NODES was in seconds. 
>>>> When 
>>>> I tested the RELS with a tiny data set it worked perfectly.  I am cleaning 
>>>> up the 52k RELS file after the first attempt failed because of a missing " 
>>>>  '  ".  
>>>>
>>>> My only issue is that the RELs loading is slow....
>>>>
>>>> commit after 1000 row(s)  0. 1%: nodes = 0 rels = 1000 properties = 
>>>> 7000 time 7059450 ms total 7059450 ms
>>>>
>>>> Now I thought that if I created an index (below), it would be faster. 
>>>> Apparently not.  
>>>>
>>>> neo4j-sh (?)$ auto-index LC_ID
>>>>
>>>> Enabling auto-indexing of Node properties: [LC_ID]
>>>>
>>>> Do I have this wrong?  Should it have been CREATE INDEX ON :LC_ID?
>>>>
>>>> Jose
>>>>
>>>>
>>>> On Monday, December 1, 2014 5:09:36 PM UTC-5, Andrii Stesin wrote:
>>>>>
>>>>> Hi José,
>>>>>
>>>>> On Monday, December 1, 2014 12:33:58 AM UTC+2, José F. Morales wrote:
>>>>>>
>>>>>> Ok, but how many valid distinct combinations of your 10 node labels 
>>>>>>> may exist? 
>>>>>>>
>>>>>>
>>>>>> JFM: 264
>>>>>>
>>>>>
>>>>> This makes me think that maybe your target data model needs some 
>>>>> refactoring. What are the entities (classes), and what can be better 
>>>>> considered as attributes? Again, I'm not familiar with LabCard, so in 
>>>>> case 
>>>>> you give some explanations and a sample dataset which is publicly 
>>>>> available, I'd take a close look at it.
>>>>>  
>>>>>
>>>>>> JFM:  Like I said, there are 264 unique combinations in all my nodes. 
>>>>>>> Some are redundant, full spelling of a term/phrase and an abbreviation. 
>>>>>>>  
>>>>>>> Some are a code for a term/phrase.  Some were created in anticipation 
>>>>>>> of 
>>>>>>> others values I would create later.  I am trying to anticipate queries 
>>>>>>> I'll 
>>>>>>> make later.
>>>>>>>
>>>>>>
>>>>> Once again, I foresee a data modelling issue here.
>>>>>  
>>>>>
>>>>>> JFM: Makes sense for speed. I guess it depends upon the size of one's 
>>>>>>>> data.
>>>>>>>>
>>>>>>>
>>>>> Sure it does :)
>>>>>  
>>>>>
>>>>>> Q3: “Skewer” is just an integer right?  It corresponds in a way to 
>>>>>>>> my_node_id 
>>>>>>>>
>>>>>>>
>>>>>>> No, it's a label! so in Cypher your node (suppose it has 2 labels 
>>>>>>> :LabelA and :LabelJ ) is described like
>>>>>>>
>>>>>>> MATCH (n:LabelA:LabelJ:Skewer {my_node_id: 123454, p1: 'something', 
>>>>>>> p2: 'something else', p3: 'etc.'})
>>>>>>>
>>>>>>>
>>>>>> JFM: Got that!
>>>>>>
>>>>>> JFM: ok basic question...  MATCH (n:  <---What is "n"? Does it just 
>>>>>> indicate that its a node of a particular class?  What letter it is is 
>>>>>> arbitrary right?  Is there a name for what "n" is? For a while there, I 
>>>>>> thought it was *my_node_ID.  *
>>>>>>
>>>>>
>>>>> *n* is just a name of the variable. Cypher, like any other 
>>>>> programming language, has a notion of "variable" which has it's name and 
>>>>> which cat take different values; here I've choose *n* just 
>>>>> occasionally for the variable name.
>>>>>  
>>>>>
>>>>>> Q4: So does repeating the LOAD CSV with each file CLT_NODES_LabelA…J 
>>>>>>>> combine the various labels and their respective values with their 
>>>>>>>> corresponding nodes? 
>>>>>>>>
>>>>>>>
>>>>>>> Label is not a variable, it does not have a value. It's just a 
>>>>>>> label, consider "tag".
>>>>>>> Also *my_node_id* IS a variable so it does have a value.
>>>>>>>
>>>>>>
>>>>>> JFM: OK, I am not understanding this.  I understood a "Label" as a 
>>>>>> general category for a node. 
>>>>>>
>>>>>
>>>>> That's Ok, or maybe even better is to imagine a tag. Node may have 
>>>>> multiple tags (labels), they can be added and/or removed.
>>>>>  
>>>>>
>>>>>> This was as opposed to a "Property" that was specific to a particular 
>>>>>> node.  As I understood it, a "Label" has different values.
>>>>>>
>>>>>
>>>>> Label is just a label. It doesn't have any value itself, it just marks 
>>>>> (tags) some (sub)set of your nodes and allows you to distinguish between 
>>>>> them. Labels may overlap. Consider automotive domain, and let's take a 
>>>>> look 
>>>>> for data model for it.
>>>>>
>>>>> Brand seems to better be modelled as a label. Say `Opel`, `Volvo` or 
>>>>> `Peugeout`.
>>>>> Kind of vehicle is definitely(???) a label. Say `Truck`, `SUV`, `Car`.
>>>>> How to model some deeper things, depends on what you are going to 
>>>>> achieve.
>>>>> Is body color a label or property? Which approach is better: either
>>>>>
>>>>> MATCH (vhcl:Truck:Volvo {body_color: 'red', VIN: 
>>>>> 'VE18727673826812634X65' })
>>>>>
>>>>> or
>>>>>
>>>>> MATCH (vhcl:Opel:Yellow:SUV {VIN: 'VE18727673826812634X65'})
>>>>>
>>>>> ? I'm not sure, it depends on the goal, as for me I'd prefer color to 
>>>>> be a property of some exact single car (once you can decide to paint your 
>>>>> yellow car in white or some other color, after all)
>>>>>
>>>>> But VIN is *definitely* a property of one exact single car.
>>>>>
>>>>> Is car license plate a label or property? Definitely none of either, 
>>>>> because you can sell your car and new owner will get another license 
>>>>> plate 
>>>>> for it, so I'd model this as
>>>>>
>>>>> MATCH (vhcl:Car:Ford {body_color: 'pink', VIN: 'FGT87356873HU8745'
>>>>> })-[:HAS_LICENSE_PLATE]->(lp:LicensePlate {state: 'AL', str: 'WH4TWR'
>>>>> })
>>>>>
>>>>>
>>>>> but as you see `LicensePlate` obviously should not be ever mixed with 
>>>>> either `Car` or `Truck`, so they are different labels which do not 
>>>>> intersect.
>>>>>
>>>>> So that Label could be "Category" and there could be two categories, 
>>>>>> for example...  CLT_SOURCE and CLT_TARGET .    I thought that makes it 
>>>>>> like 
>>>>>> a variable.  If not, the label is all the same on a given set of nodes 
>>>>>> and 
>>>>>> what's the point in that?
>>>>>>  
>>>>>> JFM: OK, I get that *my_node_id *is a variable.  
>>>>>>
>>>>>
>>>>> Agh, exactly.
>>>>>  
>>>>>
>>>>>>
>>>>>>>    1. When doing LabelA .csv you will create whatever uniquely 
>>>>>>>    numbered nodes were not already in the database, fill their 
>>>>>>> properties (or 
>>>>>>>    maybe overwrite them?) and label the node (be it new or existing 
>>>>>>> one) with 
>>>>>>>    LabelA - no matter what other labels did node (possibly) have,
>>>>>>>    
>>>>>>>  JFM: OK.  I get it.
>>>>>>
>>>>>>>
>>>>>>>    1. When doing LabelJ .csv you *again *will create whatever 
>>>>>>>    uniquely numbered nodes were not already in the database, *again* 
>>>>>>> either 
>>>>>>>    fill or overwrite propertiers, and *again* label the node (be it 
>>>>>>>    new or existing one) with LabelJ - no matter what other labels did 
>>>>>>> node 
>>>>>>>    (possibly) have,
>>>>>>>    
>>>>>>>  JFM: OK.  I get it.
>>>>>>
>>>>>>>
>>>>>>>    1. so if you created some node with first file and labeled it 
>>>>>>>    LabelA, if the same unique *my_node_id *occur both in first and 
>>>>>>>    second files, your node will get 2 labels LabelA and LabelJ.
>>>>>>>    
>>>>>>> JFM: That's wha tI want!! 
>>>>>>
>>>>>
>>>>> Huh, Ok so far :)
>>>>>  
>>>>>
>>>>>> Q5: Since I think of my data in terms of the two classes of nodes in 
>>>>>>>> my Data model …[CLT_SOURCE —> CLT_TARGET ;  CLT_TARGET —>  
>>>>>>>> CLT_SOURCE],  
>>>>>>>> after loading the nodes, how then I get two classes of nodes?
>>>>>>>>
>>>>>>>
>>>>>>> Make them 2 labels: CLTSource and CLTTarget respectively.
>>>>>>>
>>>>>>
>>>>>> JFM: OK.  Regarding the labels...my csv file has a column called DESC 
>>>>>> that has two values CLT_SOURCE and CLT_TARGET.  You are saying that 
>>>>>> my Source cvs should have a CLT_SOURCE column and my target csv 
>>>>>> should have a CLT_TARGET column?  My csv files should NOT a 
>>>>>> configuration as I described?
>>>>>>
>>>>>
>>>>> What does CLT really mean in the real life? I failed to parse :( sorry 
>>>>> for that. Once again, in case you describe the LabCard domain and provide 
>>>>> me with a dataset, I'd be able to make you some better ideas (this also 
>>>>> may 
>>>>> become a good tutorial sample case for future Neo4j users).
>>>>>  
>>>>>
>>>>>> JFM: Since my csv file has its A thru J columns  A (2) values, B (1), 
>>>>>> C (4) D (83), E (83), F (11) G (11) H (83) J (83), K (2), I should have 
>>>>>> ALOT of csv files instead of just two for nodes!
>>>>>>
>>>>>
>>>>> Again, I strongly suspect a data modelling issue here.
>>>>>   
>>>>>
>>>>>> JFM: What I am not getting from this is there is one csv file that 
>>>>>>>> has the CLTSOURCE and CLTTARGET labels in it. That contradicts what I 
>>>>>>>> said 
>>>>>>>> above because that would make only 1 csv file.  I assume this there is 
>>>>>>>> one 
>>>>>>>> LOAD CSV statement and the my_node_ID:TOINT(csvline(0)})  and 
>>>>>>>>  my_node_ID:TOINT(csvline(1)}) refer presumably to two lines in that 
>>>>>>>> file.
>>>>>>>>
>>>>>>>
>>>>> As soon as you have both src and target nodes already inside the 
>>>>> database, you need a .csv file which describes only relationships in 
>>>>> terms 
>>>>> of 1st column contains src nodes ids, 2d column contains dst nodes ids 
>>>>> and 
>>>>> thus 1 row of .csv describes 1 single relationship per (linked) pair of 
>>>>> nodes.
>>>>>
>>>>> For .csv with relationships, csvline[0] is a value of *my_node_id 
>>>>> *property 
>>>>>>>>> of the *source* node, csvline[1] is a value of *my_node_id *property 
>>>>>>>>> of the *target* node, and TOINT() type conversion is used because 
>>>>>>>>> my personal preference is to use integers for ids.
>>>>>>>>>
>>>>>>>>  
>>>>>>>
>>>>>>>> Is it that ToInt(csvline[0]} refers to the a line of the REL.csv 
>>>>>>>> file?  
>>>>>>>>
>>>>>>>> Does csvline[0] refer to a column in REL.csv as do csvline[2] and 
>>>>>>>> csvline[ZZ] (line 3) ?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> JFM: OK, I think I get it.
>>>>>>  
>>>>>>
>>>>>>> I think you can combine import of multiple .CSV files in a single 
>>>>>>> LOAD CSV statement but I didn't ever try this mode.
>>>>>>>
>>>>>>> WBR,
>>>>>>> Andrii
>>>>>>>  
>>>>>>>
>>>>>>
>>>>>> JFM: Thanks!
>>>>>>
>>>>>
>>>>> :)
>>>>>
>>>>> WBR,
>>>>> Andrii
>>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: large cypher statements

Reply via email to