Re: [Neo4j] Re: Running a match on multiple parameters

'Michael Hunger' via Neo4j Thu, 20 Apr 2017 04:55:14 -0700

Dave, Kamal,

the apoc library recently got some similarity functions, which might be
helpful for your use-case ?


Please have a look:
https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_graph_algorithms_work_in_progress

apoc.algo.cosineSimilarity([vector1], [vector2])

Compute cosine similarity

apoc.algo.euclideanDistance([vector1], [vector2])

Compute Euclidean distance

apoc.algo.euclideanSimilarity([vector1], [vector2])

Compute Euclidean similarity

Cheers, Michael

On Wed, Apr 19, 2017 at 10:00 PM, Kamal Murthy <[email protected]> wrote:

> Hi Dave,
>
> MATCH (a1:Profile)
> MATCH (b1:Profile)
> WHERE a1.profileID = 1111111111 AND b1.profileId = 1111111122
> MERGE (a1)-[rel:SIMILAR]-(b1) ON CREATE SET rel.strength = 8
>
> Q: profileID = 1111111111 will have a total marks of 10 while profileID =
> 1111111122 will have a total marks 12. I am not sure if that is what you
> want.
>
> In my opinion it is best to group by total marks, with 10 as minimum and
> 50 as maximum, assuming that marks for each question range from 1 to 5.
>
> Q. Generating .csv file.
>
> Ten questions with marks ranging from 1 to 5 for each question, there will
> be 9,765,625 records (profiles). One can create a table in SQL server
> database with ten columns (like Q1 to Q10) and 5 rows. Each column will
> have 1, 2, 3, 4, 5 values. Using a cross join, one can generate all the
> combinations like (1,1,1,1,1,1,1,1,1,1 to 5,5,5,5,5,5,5,5,5,5). Then you
> can export the data (as .csv file) with concatenating the column values to
> get the ids and sum the values to get the total marks ( all for each
> column).
>
> You can use this .csv file create nodes and relationships in Ne04j.
>
> -Kamal
>
>
>
> On Monday, April 10, 2017 at 4:39:00 AM UTC-7, Dave Clissold wrote:
>>
>> Sorry I got a little confused about what you were asking.. here is the
>> png output of the PROFILE, Is this what you were asking for?
>>
>>
>> <https://lh3.googleusercontent.com/-7xZp8YMOzkw/WOtt9QtFQQI/AAAAAAAAASQ/qdexik_7Oo0l0RV3npKsXEqaSQr9RjlqQCLcB/s1600/plan%2B%25281%2529.png>
>>
>>
>> I put the check into the smaller id, when I ran an original test it
>> created 4 different relationships per match, but I think taht was because I
>> was using MATCH not MERGE and did not have anything to stop the NODE from
>> being itself such as a1 <> b1, would this be better and only create a
>> single relationship?
>>
>> On Thursday, 23 March 2017 14:35:02 UTC, Dave Clissold wrote:
>>>
>>> I am fairly new to programming and this is my first time using graph
>>> databases, Cypher and Neo4J, I am learning as I go, testing to see if each
>>> stage is a viable route to final development and trying to gain enough of a
>>> basic understanding of each element needed for the application,  so I
>>> can hire and communicate with a full time team, as well as be able to do
>>> grunt work when needed, rather than be the entrepreneur who has no clue
>>> about what is happening and just expects things to happen. Any assistance
>>> would be greatly appreciated.
>>>
>>> I am trying to create a database which will allow users with similar
>>> profiles to match.  They have answered questions and have been able to
>>> create the nodes that would represent each profile possibility by assigning
>>> a numerical value to each answer, so I have.
>>>
>>> :Profile
>>> quA: 1, quB: 1,quC: 1, quD: 1, quE: 1, quF: 1, quG: 1, quH: 1, quI: 1,
>>> quJ: 1
>>> ....
>>> all the way to
>>> ....
>>> quA: 5, quB: 5,quC: 5, quD: 5, quE: 5, quF: 5, quG: 3, quH: 3, quI: 2,
>>> quJ: 2
>>>
>>> where each numerical value is stored as an integer, this has resulted in
>>> 562500 nodes imported by CSV this created a 515Mb database. I have also
>>> concatenated the answers to create a unique ID for each node so that I can
>>> run the following query.
>>>
>>> MATCH (a1:Profile), (b1:Profile)
>>> WHERE a1.profileID < b1.profileId AND a1.quA = b1.quA AND a1.quB =
>>> b1.quB AND a1.quC = b1.quC AND a1.quD = b1.quD AND a1.quE = b1.quE AND
>>> a1.quF = b1.quF AND a1.quG = b1.quG
>>> CREATE UNIQUE (a1)-[:SIMILAR  {strength: 7} ]->(b1)
>>>
>>>
>>> and so on so that I have every combination of 7 parameters matching up
>>> to 9 parameters matching. I know that will eventually create 175
>>> relationships per node so a massive total of 98,437,500 relationships.
>>>
>>>
>>> Have set this up in a docker container on a google compute 8core 52Gb
>>> (the max on the free trial option), with a 65500MB heap size, (based on the
>>> calculator).
>>>
>>> I am trying to find out if there is a more efficient way to create these
>>> relationships, as on this setup, I have tried running the 1st query,
>>> above), it has currently taken over 5 hours and has not finished, .  Can
>>> anyone suggest a better query or workflow to create such a large number of
>>> relationships?  The last thing I want to do is try and create individual
>>> relationships and input them, unless someone can suggest a way of doing
>>> this via a script and to send the queries via json.
>>>
>>> Regards
>>>
>>>
>>> Dave
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: Running a match on multiple parameters

Reply via email to