Re: [Neo4j] Re: Why is Neo4j slower(totally dead) with many nodes and relationships in lower specification of pc/notebook while MySQL is not?

Michael Hunger Tue, 01 Apr 2014 07:08:39 -0700

Probably something like this but not sure, to many F's

MATCH (U:User)-[F:Friend]->(FU:User)-[:Friend]->(FFU:User)
WHERE U.user_id=1
WITH DISTINCT U, FU, FFU
WHERE FFU<>U
MATCH (FFU:User)-[:Friend]->(FFFU:User)
WITH DISTINCT U, FFU, FFFU
WHERE FFFU<>FU
MATCH (FFFU:User)-[:Friend]->(FFFFU:User)
WITH DISTINCT U, FFU, FFFU, FFFFU
WHERE FFFFU<>FFU AND FFFFU<>U AND NOT (U)-[:Friend]->(FFFFU)
RETURN DISTINCT FFFFU.username;


you might also try:

MATCH (U:User)-[F:Friend]->(FU:User)-[:Friend]->(FFU:User)
WHERE U.user_id=1
WHERE FFU<>U
WITH DISTINCT U, FU, FFU

MATCH (FFU:User)-[:Friend]->(FFFU:User)
WHERE FFFU<>FU
WITH DISTINCT U, FFU, FFFU

MATCH (FFFU:User)-[:Friend]->(FFFFU:User)
WHERE FFFFU<>FFU AND FFFFU<>U AND NOT (U)-[:Friend]->(FFFFU)

RETURN DISTINCT FFFFU.username;



On Tue, Apr 1, 2014 at 3:44 PM, Rio Eduardo <[email protected]> wrote:

> Thank you.
>
>
> On Tuesday, April 1, 2014 8:31:25 PM UTC+7, Michael Hunger wrote:
>
>> For the traversal framework check out: http://docs.neo4j.org/
>> chunked/milestone/tutorial-traversal.html
>>
>>
>> On Tue, Apr 1, 2014 at 3:09 PM, Rio Eduardo <[email protected]> wrote:
>>
>>> Hi Michael,
>>>
>>> you said "In general if you really want to do these deep traversals you
>>> might be better off (in terms of performance) using the traversal-API with
>>> an appropriate uniqueness constraint, like node-path". Please give me any
>>> references so I can learn it. or Does it mean you suggest me to use Gremlin?
>>>
>>> Thank you.
>>>
>>>
>>> On Monday, March 31, 2014 8:09:32 PM UTC+7, Michael Hunger wrote:
>>>
>>>> Just use a dataset that you can reason about and check if they work
>>>> correctly.
>>>>
>>>> Hard for me to be the consistency checker on your queries :)
>>>>
>>>> In general if you really want to do these deep traversals you might be
>>>> better off (in terms of performance) using the traversal-API with an
>>>> appropriate uniqueness constraint, like node-path.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Mar 31, 2014 at 1:09 PM, Rio Eduardo <[email protected]>wrote:
>>>>
>>>>> Hello again Michael.
>>>>>
>>>>> I just want to make sure that my query is correct to find friends of
>>>>> friends at depth of four and five. Please help me by checking my query.
>>>>>
>>>>> Query at depth of four:
>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>> WHERE U.user_id=1
>>>>> WITH DISTINCT U, FU, FFU
>>>>> WHERE FFU<>U
>>>>> WITH DISTINCT U, FU, FFU
>>>>> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
>>>>> WHERE FFFU<>FU
>>>>> WITH DISTINCT U, FFU, FFFU
>>>>> MATCH (FFFU:User)-[FFFF:Friend]->(FFFFU:User)
>>>>> WHERE FFFFU<>FFU AND FFFFU<>U AND NOT (U)-[:Friend]->(FFFFU)
>>>>> RETURN DISTINCT FFFFU.username;
>>>>>
>>>>> Query at depth of five:
>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>> WHERE U.user_id=1
>>>>> WITH DISTINCT U, FU, FFU
>>>>> WHERE FFU<>U
>>>>> WITH DISTINCT U, FU, FFU
>>>>> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
>>>>> WHERE FFFU<>FU
>>>>> WITH DISTINCT U, FFU, FFFU
>>>>> MATCH (FFFU:User)-[FFFF:Friend]->(FFFFU:User)
>>>>> WHERE FFFFU<>FFU
>>>>> WITH DISTINCT U, FFFU, FFFFU
>>>>> MATCH (FFFFU:User)-[FFFFF:Friend]->(FFFFFU:User)
>>>>> WHERE FFFFFU<>FFFU AND FFFFFU<>U AND NOT (U)-[:Friend]->(FFFFFU)
>>>>> RETURN DISTINCT FFFFFU.username;
>>>>>
>>>>> I need your help so much.
>>>>> Thank you.
>>>>>
>>>>>
>>>>> On Sunday, March 30, 2014 7:42:27 PM UTC+7, Michael Hunger wrote:
>>>>>
>>>>>> Split it up in one more intermediate step, the intermediate steps are
>>>>>> there to get the cardinality down, so it doesn't have to match billions 
>>>>>> of
>>>>>> paths, only millions or 100k
>>>>>>
>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:
>>>>>> Friend]->(FFFU:User)
>>>>>> WHERE U.user_id=1
>>>>>> WITH DISTINCT U, FU, FFU
>>>>>> WHERE FFU<>U
>>>>>> WITH DISTINCT U, FFU
>>>>>> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
>>>>>> WHERE NOT (U)-[:Friend]->(FFFU)
>>>>>> RETURN distinct FFFU.username;
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 30, 2014 at 1:29 PM, Rio Eduardo <[email protected]>wrote:
>>>>>>
>>>>>>> Please help me again Michael.
>>>>>>>
>>>>>>> You ever said:
>>>>>>>
>>>>>>> I would also change:
>>>>>>>
>>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>>> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT
>>>>>>> (U)-[:Friend]->(FFU)
>>>>>>> RETURN FFU.username
>>>>>>>
>>>>>>> to
>>>>>>>
>>>>>>>  MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>>> WHERE U.user_id=1
>>>>>>> WITH distinct U, FFU
>>>>>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>>>>>> RETURN FFU.username
>>>>>>>
>>>>>>> Query above is to find friends of friends at depth of two. And I
>>>>>>> would like to find friends of friends  at depth of three, when I use 
>>>>>>> model
>>>>>>> of your query, it returns result longer than mine and the result is much
>>>>>>> more than mine. Ok so here is model of your query at depth of three:
>>>>>>>
>>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:
>>>>>>> Friend]->(FFFU:User)
>>>>>>> WHERE U.user_id=1
>>>>>>> WITH DISTINCT U, FU, FFU, FFFU
>>>>>>> WHERE FFU<>U AND FFFU<>FU AND NOT (U)-[:Friend]->(FFFU)
>>>>>>> RETURN FFFU.username;
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> 118858 rows
>>>>>>> 20090 ms
>>>>>>>
>>>>>>> Mine:
>>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:
>>>>>>> Friend]->(FFFU:User)
>>>>>>> WHERE U.user_id=1 AND FFU<>U AND FFFU<>FU AND NOT
>>>>>>> (U)-[:Friend]->(FFFU)
>>>>>>> RETURN DISTINCT FFFU.username;
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> 950 rows
>>>>>>> 18133 ms
>>>>>>>
>>>>>>> Please help me, Why is model of your query longer than mine and
>>>>>>> return much more results than mine?
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Friday, March 28, 2014 8:30:20 PM UTC+7, Michael Hunger wrote:
>>>>>>>
>>>>>>>> Rio,
>>>>>>>>
>>>>>>>> was this your first run of both statements? If so, please run them
>>>>>>>> for a second time.
>>>>>>>> And did you create an index or constraint for :User(user_id) ?
>>>>>>>>
>>>>>>>> MATCH (U:User) RETURN COUNT(U);
>>>>>>>>
>>>>>>>> I would also change:
>>>>>>>>
>>>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>>>> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT
>>>>>>>> (U)-[:Friend]->(FFU)
>>>>>>>> RETURN FFU.username
>>>>>>>>
>>>>>>>> to
>>>>>>>>
>>>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>>>> WHERE U.user_id=1
>>>>>>>> WITH distinct U, FFU
>>>>>>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>>>>>>> RETURN FFU.username
>>>>>>>>
>>>>>>>> I quickly created a dataset on my machine:
>>>>>>>>
>>>>>>>> cypher 2.0 foreach (i in range(1,1000) | create (:User {id:i}));
>>>>>>>>
>>>>>>>> create constraint on (u:User) assert u.id is unique;
>>>>>>>>
>>>>>>>> match (u1:User),(u2:User) with u1,u2 where rand() < 0.1 create
>>>>>>>> (u1)-[:Friend]->(u2);
>>>>>>>>
>>>>>>>> Relationships created: 99974
>>>>>>>>
>>>>>>>> 778 ms
>>>>>>>>
>>>>>>>> match (u:User) return count(*);
>>>>>>>>
>>>>>>>> +----------+
>>>>>>>> | count(*) |
>>>>>>>> +----------+
>>>>>>>> | 1000     |
>>>>>>>> +----------+
>>>>>>>> 1 row
>>>>>>>> *4 ms*
>>>>>>>>
>>>>>>>>
>>>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>>>> WHERE U.id=1
>>>>>>>> WITH distinct U, FFU
>>>>>>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>>>>>>> RETURN FFU.id;
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>> 910 rows
>>>>>>>>
>>>>>>>> 101 ms
>>>>>>>>
>>>>>>>> but even your query takes only
>>>>>>>>
>>>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>>>> WHERE U.id=1 AND FFU.id<>U.id AND NOT (U)-[:Friend]->(FFU)
>>>>>>>> RETURN FFU.id;
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>> 8188 rows
>>>>>>>>
>>>>>>>> 578 ms
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 28, 2014 at 2:08 PM, Lundin <[email protected]>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > ms, it is milliseconds.
>>>>>>>> >
>>>>>>>> > What is the corresponding result for a SQL db ?
>>>>>>>> > MATCH (n:User)-[:Friend*3]-(FoFoF) return FoFoF;
>>>>>>>> >
>>>>>>>> > Albeit a valid search is it something useful ? I would think
>>>>>>>> finding a specific persons FoFoF in either end, as a starting point or 
>>>>>>>> end
>>>>>>>> point, would be a very realistic scenario. Adding an Index on 
>>>>>>>> User:name and
>>>>>>>> query for a User with name:Rio try to find his FoFoF.
>>>>>>>> >
>>>>>>>> > Yes, neo4j has been kind and exposed various function, like
>>>>>>>> shortestpath in cypher
>>>>>>>> > http://docs.neo4j.org/refcard/2.0/
>>>>>>>> >
>>>>>>>> > Also look at some gist examples
>>>>>>>> > https://github.com/neo4j-contrib/graphgist/wiki
>>>>>>>> >
>>>>>>>> > Den fredagen den 28:e mars 2014 kl. 05:00:22 UTC+1 skrev Rio
>>>>>>>> Eduardo:
>>>>>>>> >>
>>>>>>>> >> Thank you so much for the reply Lundin. I really apreciate it.
>>>>>>>> Okay, yesterday I just tested my experiment again. And the result was 
>>>>>>>> not
>>>>>>>> what I imagined and expected before. Okay, before I tested 1M users, I
>>>>>>>> reduced the number of users into 1000 users and tested it not in my 
>>>>>>>> social
>>>>>>>> network but directly in database only(Neo4j Shell) to find out that it 
>>>>>>>> was
>>>>>>>> not caused by the performance of pc. But the result of returning 1000 
>>>>>>>> users
>>>>>>>> was 200ms and 1 row and the result of returning friends at depth of 
>>>>>>>> two was
>>>>>>>> 85000ms and 2500 rows and are 200ms and 85000ms fast to you? and what 
>>>>>>>> does
>>>>>>>> ms stand for? is it milliseconds or microseconds?
>>>>>>>> >>
>>>>>>>> >> the query I use for returning 1000 users is
>>>>>>>> >> MATCH (U:User) RETURN COUNT(U);
>>>>>>>> >>
>>>>>>>> >> and the query I use for returning friends at depth of two is
>>>>>>>> >> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>>>> >> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT
>>>>>>>> (U)-[:Friend]->(FFU)
>>>>>>>> >> RETURN FFU.username
>>>>>>>> >>
>>>>>>>> >> Please note that I tested with default configuration of Neo4j
>>>>>>>> and created users with 1000 random nodes and created friends 
>>>>>>>> relationships
>>>>>>>> with 50000 random relationships(1 user has 50 friends). Each 
>>>>>>>> relationship
>>>>>>>> has a label Friend and no properties on it. Each node has a label 
>>>>>>>> User, 4
>>>>>>>> properties: user_id, username, password and profile_picture. Each 
>>>>>>>> property
>>>>>>>> has a value of 1-60 characters. average of characters of user_id=1-1000
>>>>>>>> characters, all usernames have 10 characters randomly, all passwords 
>>>>>>>> have
>>>>>>>> 60 characters because I MD5 it, and profile_picture has 1-60 
>>>>>>>> characters.
>>>>>>>> >>
>>>>>>>> >> And about your statement "Otherwise if you really need to
>>>>>>>> present that many "things" just paging the result with SKIP,LIMIT. I 
>>>>>>>> has
>>>>>>>> never made sense to present 1M of anything at a time for a user.", I
>>>>>>>> already did according to your statement above but it is still the same,
>>>>>>>> Neo4j returns result slower.
>>>>>>>> >>
>>>>>>>> >> And I'm wondering if Neo4j already applied one of graph
>>>>>>>> algorithms(shortest path, djikstra, A*, etc) in its system or not.
>>>>>>>> >>
>>>>>>>> >> Thank you.
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Friday, March 28, 2014 3:43:49 AM UTC+7, Lundin wrote:
>>>>>>>> >>>
>>>>>>>> >>> Rio, any version will do. They can all handle million nodes on
>>>>>>>> common hardware, no magic at all. When hundred of millions of billions 
>>>>>>>> then
>>>>>>>> we might need to look into specfication more in detail. But in that 
>>>>>>>> case
>>>>>>>> with that kind of data there are other bottlencks for a social network 
>>>>>>>> or
>>>>>>>> any web appp that needs to be taken care of as well.
>>>>>>>> >>>
>>>>>>>> >>> you said:
>>>>>>>> >>>>
>>>>>>>> >>>>  Given any two persons chosen at random, is there a path that
>>>>>>>> connects them that is at most five relationships long? For a social 
>>>>>>>> network
>>>>>>>> containing 1,000,000 people, each with approximately 50 friends, the
>>>>>>>> results strongly suggest that graph databases are the best choice for
>>>>>>>> connected data. And graph database can still work 150 times faster than
>>>>>>>> relational database at third degree and 1000 times faster at fourth 
>>>>>>>> degre
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> I fail to see how this is connected to your attempt to list 1M
>>>>>>>> users in one go at the first page. You would want to seek if there is a
>>>>>>>> relationship and return that path between users. You need two start 
>>>>>>>> nodes
>>>>>>>> and seek a path by traveser the relationsip rather than scan tables and
>>>>>>>> that would be the comparison.
>>>>>>>> >>> Otherwise if you really need to present that many "things" just
>>>>>>>> paging the result with SKIP,LIMIT. I has never made sense to present 
>>>>>>>> 1M of
>>>>>>>> anything at a time for a user. Again, that wouldn't really serve your
>>>>>>>> experiment much good to prove graph theory.
>>>>>>>> >>>
>>>>>>>> >>> What is the result of MATCH(U:User) RETURN count(U); ?
>>>>>>>> >>>
>>>>>>>> >>> Also when you do your test make sure to add the warm/cold cache
>>>>>>>> effect (better/worse performance)
>>>>>>>> >>>
>>>>>>>> >>> Den torsdagen den 27:e mars 2014 kl. 17:57:10 UTC+1 skrev Rio
>>>>>>>> Eduardo:
>>>>>>>> >>>>
>>>>>>>> >>>> I just knew about memory allocation and just read Server
>>>>>>>> Performance Tuning of Neo4j. neo4j.properties:
>>>>>>>> >>>> # Default values for the low-level graph engine
>>>>>>>> >>>>
>>>>>>>> >>>> #neostore.nodestore.db.mapped_memory=25M
>>>>>>>> >>>> #neostore.relationshipstore.db.mapped_memory=50M
>>>>>>>> >>>> #neostore.propertystore.db.mapped_memory=90M
>>>>>>>> >>>> #neostore.propertystore.db.strings.mapped_memory=130M
>>>>>>>> >>>> #neostore.propertystore.db.arrays.mapped_memory=130M
>>>>>>>> >>>>
>>>>>>>> >>>> Should I change this to get high performance? If yes, please
>>>>>>>> suggest me.
>>>>>>>> >>>>
>>>>>>>> >>>> And I just knew about Neo4j Licenses, they are Community,
>>>>>>>> Personal, Startups, Business and Enterprise. And at Neo4j website all
>>>>>>>> features are explained. So which Neo4j should I use for my case that 
>>>>>>>> has
>>>>>>>> millions nodes and relationships?
>>>>>>>> >>>>
>>>>>>>> >>>> Please answer. I need your help so much.
>>>>>>>> >>>>
>>>>>>>> >>>> Thanks.
>>>>>>>> >>>>
>>>>>>>> >>>> On Tuesday, March 25, 2014 12:03:58 AM UTC+7, Rio Eduardo
>>>>>>>> wrote:
>>>>>>>> >>>>>
>>>>>>>> >>>>> I'm testing my thesis which is about transforming from
>>>>>>>> relational database to graph database. After transforming from 
>>>>>>>> relational
>>>>>>>> database to graph database, I will test their own performance 
>>>>>>>> according to
>>>>>>>> query response time and throughput. In relational database, I use MySQL
>>>>>>>> while in graph database I use Neo4j for testing. I will have 3 Million 
>>>>>>>> more
>>>>>>>> nodes and 6 Million more relationships. But when I just added 60000 
>>>>>>>> nodes,
>>>>>>>> my Neo4j is already dead. When I tried to return all 60000 nodes, it
>>>>>>>> returned unknown. I did the same to MySQL, I added 60000 records but it
>>>>>>>> could return all 60000 records. It's weird because it's against the 
>>>>>>>> papers
>>>>>>>> I read that told me graph database is faster than relational database 
>>>>>>>> So
>>>>>>>> Why is Neo4j slower(totally dead) in lower specification of pc/notebook
>>>>>>>> while MySQL is not? And What specification of pc/notebook do I should 
>>>>>>>> use
>>>>>>>> to give the best performance during testing with millions of nodes and
>>>>>>>> relationships?
>>>>>>>> >>>>>
>>>>>>>> >>>>> Thank you.
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > You received this message because you are subscribed to the
>>>>>>>> Google Groups "Neo4j" group.
>>>>>>>> > To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>>
>>>>>>>> > For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "Neo4j" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: Why is Neo4j slower(totally dead) with many nodes and relationships in lower specification of pc/notebook while MySQL is not?

Reply via email to