Re: [Neo4j] Running a match on multiple parameters

Dave Clissold Wed, 05 Apr 2017 03:31:53 -0700

HI Michael

Thanks for this, I tried breaking everything down into a limit of 1,000 
still takes forever to run.  Do you know of anyway to create a repeating 
script or loop that can parse a csv file with each permutation per line, 
each answer is a different column, concatenate these numbers into various 
queries and repeat till the csv file has finished? I have mapped out all 
the potentials combinations for 8 parameters and creating individual 
relationships based on a receiving line(array) from the csv file  for 
example


quA, quB, quC, quD, quE, quF, quG, quH, 1, 1
quA, quB, quC, quD, quE, quF, quG, 1, quI, 1
quA, quB, quC, quD, quE, quF, 1, quH, quI, 1 

though to....

5, 5, quC, quD, quE, quF, quG, quH, quI, quJ

*(where each qu* represents a column in the csv file)*

*and the merge query would look ike*


MATCH (a1:Profile)
MATCH (b1:Profile)
WHERE a1.profileID = 1111111111 AND b1.profileId = 1111111122
MERGE (a1)-[rel:SIMILAR]-(b1) ON CREATE SET rel.strength = 8


There are 720 of these in total and if I could parse in each of the 562500 
into this as a batch, it would probably work and not cause me a bunch of 
headaches, so I can then get on with testing the ideas behind the 
application.  

Being on a  self-educating path is really showing it's limitations now.

Dave


On Thursday, 23 March 2017 21:30:47 UTC, Michael Hunger wrote:
>
> Hi Dave,
>
> would be good to look at a sample first of all:
>
> you should create about 10k-100k relationships per transaction.
>
> For "joining" nodes which is not an optimized graph operation, you should 
> have at least the very selective properties to be indexed.
>
> Before running the queries I suggest to use EXPLAIN / PROFILE
>
> e.g.
>
> MATCH (a1:Profile), (b1:Profile)
> WHERE a1.profileID < b1.profileId AND a1.quA = b1.quA AND a1.quB = b1.quB 
> AND a1.quC = b1.quC AND a1.quD = b1.quD AND a1.quE = b1.quE AND a1.quF = 
> b1.quF AND a1.quG = b1.quG
> CREATE UNIQUE (a1)-[:SIMILAR  {strength: 7} ]->(b1)
>
> PROFILE / EXPLAIN
> MATCH (a1:Profile)
> WITH a1 LIMIT 1000 // sample
> MATCH (b1:Profile)
> WHERE a1.profileID < b1.profileId AND a1.quA = b1.quA AND a1.quB = b1.quB 
> AND a1.quC = b1.quC AND a1.quD = b1.quD AND a1.quE = b1.quE AND a1.quF = 
> b1.quF AND a1.quG = b1.quG
> MERGE (a1)-[rel:SIMILAR]-(b1) ON CREATE SET rel.strength = 7
>
> you should at least see one index lookup for b1 best if it was the most 
> selective property.
>
> Michael
>
>
> On Thu, Mar 23, 2017 at 3:35 PM, Dave Clissold <[email protected] 
> <javascript:>> wrote:
>
>> I am fairly new to programming and this is my first time using graph 
>> databases, Cypher and Neo4J, I am learning as I go, testing to see if each 
>> stage is a viable route to final development and trying to gain enough of a 
>> basic understanding of each element needed for the application,  so I 
>> can hire and communicate with a full time team, as well as be able to do 
>> grunt work when needed, rather than be the entrepreneur who has no clue 
>> about what is happening and just expects things to happen. Any assistance 
>> would be greatly appreciated.
>>
>> I am trying to create a database which will allow users with similar 
>> profiles to match.  They have answered questions and have been able to 
>> create the nodes that would represent each profile possibility by assigning 
>> a numerical value to each answer, so I have.
>>
>> :Profile
>> quA: 1, quB: 1,quC: 1, quD: 1, quE: 1, quF: 1, quG: 1, quH: 1, quI: 1, 
>> quJ: 1
>> ....
>> all the way to
>> ....
>> quA: 5, quB: 5,quC: 5, quD: 5, quE: 5, quF: 5, quG: 3, quH: 3, quI: 2, 
>> quJ: 2
>>
>> where each numerical value is stored as an integer, this has resulted in 
>> 562500 nodes imported by CSV this created a 515Mb database. I have also 
>> concatenated the answers to create a unique ID for each node so that I can 
>> run the following query.
>>
>> MATCH (a1:Profile), (b1:Profile)
>> WHERE a1.profileID < b1.profileId AND a1.quA = b1.quA AND a1.quB = b1.quB 
>> AND a1.quC = b1.quC AND a1.quD = b1.quD AND a1.quE = b1.quE AND a1.quF = 
>> b1.quF AND a1.quG = b1.quG
>> CREATE UNIQUE (a1)-[:SIMILAR  {strength: 7} ]->(b1)
>>
>>
>> and so on so that I have every combination of 7 parameters matching up to 
>> 9 parameters matching. I know that will eventually create 175 relationships 
>> per node so a massive total of 98,437,500 relationships.
>>
>>
>> Have set this up in a docker container on a google compute 8core 52Gb 
>> (the max on the free trial option), with a 65500MB heap size, (based on the 
>> calculator).
>>
>> I am trying to find out if there is a more efficient way to create these 
>> relationships, as on this setup, I have tried running the 1st query, 
>> above), it has currently taken over 5 hours and has not finished, .  Can 
>> anyone suggest a better query or workflow to create such a large number of 
>> relationships?  The last thing I want to do is try and create individual 
>> relationships and input them, unless someone can suggest a way of doing 
>> this via a script and to send the queries via json.
>>
>> Regards
>>
>>
>> Dave
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Running a match on multiple parameters

Reply via email to