Re: [Neo4j] Re: Why is Neo4j slower(totally dead) with many nodes and relationships in lower specification of pc/notebook while MySQL is not?

Michael Hunger Tue, 01 Apr 2014 06:31:42 -0700

For the traversal framework check out:
http://docs.neo4j.org/chunked/milestone/tutorial-traversal.html



On Tue, Apr 1, 2014 at 3:09 PM, Rio Eduardo <[email protected]> wrote:

> Hi Michael,
>
> you said "In general if you really want to do these deep traversals you
> might be better off (in terms of performance) using the traversal-API with
> an appropriate uniqueness constraint, like node-path". Please give me any
> references so I can learn it. or Does it mean you suggest me to use Gremlin?
>
> Thank you.
>
>
> On Monday, March 31, 2014 8:09:32 PM UTC+7, Michael Hunger wrote:
>
>> Just use a dataset that you can reason about and check if they work
>> correctly.
>>
>> Hard for me to be the consistency checker on your queries :)
>>
>> In general if you really want to do these deep traversals you might be
>> better off (in terms of performance) using the traversal-API with an
>> appropriate uniqueness constraint, like node-path.
>>
>>
>>
>>
>> On Mon, Mar 31, 2014 at 1:09 PM, Rio Eduardo <[email protected]> wrote:
>>
>>> Hello again Michael.
>>>
>>> I just want to make sure that my query is correct to find friends of
>>> friends at depth of four and five. Please help me by checking my query.
>>>
>>> Query at depth of four:
>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>> WHERE U.user_id=1
>>> WITH DISTINCT U, FU, FFU
>>> WHERE FFU<>U
>>> WITH DISTINCT U, FU, FFU
>>> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
>>> WHERE FFFU<>FU
>>> WITH DISTINCT U, FFU, FFFU
>>> MATCH (FFFU:User)-[FFFF:Friend]->(FFFFU:User)
>>> WHERE FFFFU<>FFU AND FFFFU<>U AND NOT (U)-[:Friend]->(FFFFU)
>>> RETURN DISTINCT FFFFU.username;
>>>
>>> Query at depth of five:
>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>> WHERE U.user_id=1
>>> WITH DISTINCT U, FU, FFU
>>> WHERE FFU<>U
>>> WITH DISTINCT U, FU, FFU
>>> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
>>> WHERE FFFU<>FU
>>> WITH DISTINCT U, FFU, FFFU
>>> MATCH (FFFU:User)-[FFFF:Friend]->(FFFFU:User)
>>> WHERE FFFFU<>FFU
>>> WITH DISTINCT U, FFFU, FFFFU
>>> MATCH (FFFFU:User)-[FFFFF:Friend]->(FFFFFU:User)
>>> WHERE FFFFFU<>FFFU AND FFFFFU<>U AND NOT (U)-[:Friend]->(FFFFFU)
>>> RETURN DISTINCT FFFFFU.username;
>>>
>>> I need your help so much.
>>> Thank you.
>>>
>>>
>>> On Sunday, March 30, 2014 7:42:27 PM UTC+7, Michael Hunger wrote:
>>>
>>>> Split it up in one more intermediate step, the intermediate steps are
>>>> there to get the cardinality down, so it doesn't have to match billions of
>>>> paths, only millions or 100k
>>>>
>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:
>>>> Friend]->(FFFU:User)
>>>> WHERE U.user_id=1
>>>> WITH DISTINCT U, FU, FFU
>>>> WHERE FFU<>U
>>>> WITH DISTINCT U, FFU
>>>> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
>>>> WHERE NOT (U)-[:Friend]->(FFFU)
>>>> RETURN distinct FFFU.username;
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Mar 30, 2014 at 1:29 PM, Rio Eduardo <[email protected]>wrote:
>>>>
>>>>> Please help me again Michael.
>>>>>
>>>>> You ever said:
>>>>>
>>>>> I would also change:
>>>>>
>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT
>>>>> (U)-[:Friend]->(FFU)
>>>>> RETURN FFU.username
>>>>>
>>>>> to
>>>>>
>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>> WHERE U.user_id=1
>>>>> WITH distinct U, FFU
>>>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>>>> RETURN FFU.username
>>>>>
>>>>> Query above is to find friends of friends at depth of two. And I would
>>>>> like to find friends of friends  at depth of three, when I use model of
>>>>> your query, it returns result longer than mine and the result is much more
>>>>> than mine. Ok so here is model of your query at depth of three:
>>>>>
>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:
>>>>> Friend]->(FFFU:User)
>>>>> WHERE U.user_id=1
>>>>> WITH DISTINCT U, FU, FFU, FFFU
>>>>> WHERE FFU<>U AND FFFU<>FU AND NOT (U)-[:Friend]->(FFFU)
>>>>> RETURN FFFU.username;
>>>>>
>>>>> ...
>>>>>
>>>>> 118858 rows
>>>>> 20090 ms
>>>>>
>>>>> Mine:
>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:
>>>>> Friend]->(FFFU:User)
>>>>> WHERE U.user_id=1 AND FFU<>U AND FFFU<>FU AND NOT (U)-[:Friend]->(FFFU)
>>>>> RETURN DISTINCT FFFU.username;
>>>>>
>>>>> ...
>>>>>
>>>>> 950 rows
>>>>> 18133 ms
>>>>>
>>>>> Please help me, Why is model of your query longer than mine and return
>>>>> much more results than mine?
>>>>>
>>>>> Thank you.
>>>>>
>>>>>
>>>>>
>>>>> On Friday, March 28, 2014 8:30:20 PM UTC+7, Michael Hunger wrote:
>>>>>
>>>>>> Rio,
>>>>>>
>>>>>> was this your first run of both statements? If so, please run them
>>>>>> for a second time.
>>>>>> And did you create an index or constraint for :User(user_id) ?
>>>>>>
>>>>>> MATCH (U:User) RETURN COUNT(U);
>>>>>>
>>>>>> I would also change:
>>>>>>
>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT
>>>>>> (U)-[:Friend]->(FFU)
>>>>>> RETURN FFU.username
>>>>>>
>>>>>> to
>>>>>>
>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>> WHERE U.user_id=1
>>>>>> WITH distinct U, FFU
>>>>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>>>>> RETURN FFU.username
>>>>>>
>>>>>> I quickly created a dataset on my machine:
>>>>>>
>>>>>> cypher 2.0 foreach (i in range(1,1000) | create (:User {id:i}));
>>>>>>
>>>>>> create constraint on (u:User) assert u.id is unique;
>>>>>>
>>>>>> match (u1:User),(u2:User) with u1,u2 where rand() < 0.1 create
>>>>>> (u1)-[:Friend]->(u2);
>>>>>>
>>>>>> Relationships created: 99974
>>>>>>
>>>>>> 778 ms
>>>>>>
>>>>>> match (u:User) return count(*);
>>>>>>
>>>>>> +----------+
>>>>>> | count(*) |
>>>>>> +----------+
>>>>>> | 1000     |
>>>>>> +----------+
>>>>>> 1 row
>>>>>> *4 ms*
>>>>>>
>>>>>>
>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>> WHERE U.id=1
>>>>>> WITH distinct U, FFU
>>>>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>>>>> RETURN FFU.id;
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> 910 rows
>>>>>>
>>>>>> 101 ms
>>>>>>
>>>>>> but even your query takes only
>>>>>>
>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>> WHERE U.id=1 AND FFU.id<>U.id AND NOT (U)-[:Friend]->(FFU)
>>>>>> RETURN FFU.id;
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> 8188 rows
>>>>>>
>>>>>> 578 ms
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 28, 2014 at 2:08 PM, Lundin <[email protected]> wrote:
>>>>>> >
>>>>>> > ms, it is milliseconds.
>>>>>> >
>>>>>> > What is the corresponding result for a SQL db ?
>>>>>> > MATCH (n:User)-[:Friend*3]-(FoFoF) return FoFoF;
>>>>>> >
>>>>>> > Albeit a valid search is it something useful ? I would think
>>>>>> finding a specific persons FoFoF in either end, as a starting point or 
>>>>>> end
>>>>>> point, would be a very realistic scenario. Adding an Index on User:name 
>>>>>> and
>>>>>> query for a User with name:Rio try to find his FoFoF.
>>>>>> >
>>>>>> > Yes, neo4j has been kind and exposed various function, like
>>>>>> shortestpath in cypher
>>>>>> > http://docs.neo4j.org/refcard/2.0/
>>>>>> >
>>>>>> > Also look at some gist examples
>>>>>> > https://github.com/neo4j-contrib/graphgist/wiki
>>>>>> >
>>>>>> > Den fredagen den 28:e mars 2014 kl. 05:00:22 UTC+1 skrev Rio
>>>>>> Eduardo:
>>>>>> >>
>>>>>> >> Thank you so much for the reply Lundin. I really apreciate it.
>>>>>> Okay, yesterday I just tested my experiment again. And the result was not
>>>>>> what I imagined and expected before. Okay, before I tested 1M users, I
>>>>>> reduced the number of users into 1000 users and tested it not in my 
>>>>>> social
>>>>>> network but directly in database only(Neo4j Shell) to find out that it 
>>>>>> was
>>>>>> not caused by the performance of pc. But the result of returning 1000 
>>>>>> users
>>>>>> was 200ms and 1 row and the result of returning friends at depth of two 
>>>>>> was
>>>>>> 85000ms and 2500 rows and are 200ms and 85000ms fast to you? and what 
>>>>>> does
>>>>>> ms stand for? is it milliseconds or microseconds?
>>>>>> >>
>>>>>> >> the query I use for returning 1000 users is
>>>>>> >> MATCH (U:User) RETURN COUNT(U);
>>>>>> >>
>>>>>> >> and the query I use for returning friends at depth of two is
>>>>>> >> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>> >> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT
>>>>>> (U)-[:Friend]->(FFU)
>>>>>> >> RETURN FFU.username
>>>>>> >>
>>>>>> >> Please note that I tested with default configuration of Neo4j and
>>>>>> created users with 1000 random nodes and created friends relationships 
>>>>>> with
>>>>>> 50000 random relationships(1 user has 50 friends). Each relationship has 
>>>>>> a
>>>>>> label Friend and no properties on it. Each node has a label User, 4
>>>>>> properties: user_id, username, password and profile_picture. Each 
>>>>>> property
>>>>>> has a value of 1-60 characters. average of characters of user_id=1-1000
>>>>>> characters, all usernames have 10 characters randomly, all passwords have
>>>>>> 60 characters because I MD5 it, and profile_picture has 1-60 characters.
>>>>>> >>
>>>>>> >> And about your statement "Otherwise if you really need to present
>>>>>> that many "things" just paging the result with SKIP,LIMIT. I has never 
>>>>>> made
>>>>>> sense to present 1M of anything at a time for a user.", I already did
>>>>>> according to your statement above but it is still the same, Neo4j returns
>>>>>> result slower.
>>>>>> >>
>>>>>> >> And I'm wondering if Neo4j already applied one of graph
>>>>>> algorithms(shortest path, djikstra, A*, etc) in its system or not.
>>>>>> >>
>>>>>> >> Thank you.
>>>>>> >>
>>>>>> >>
>>>>>> >> On Friday, March 28, 2014 3:43:49 AM UTC+7, Lundin wrote:
>>>>>> >>>
>>>>>> >>> Rio, any version will do. They can all handle million nodes on
>>>>>> common hardware, no magic at all. When hundred of millions of billions 
>>>>>> then
>>>>>> we might need to look into specfication more in detail. But in that case
>>>>>> with that kind of data there are other bottlencks for a social network or
>>>>>> any web appp that needs to be taken care of as well.
>>>>>> >>>
>>>>>> >>> you said:
>>>>>> >>>>
>>>>>> >>>>  Given any two persons chosen at random, is there a path that
>>>>>> connects them that is at most five relationships long? For a social 
>>>>>> network
>>>>>> containing 1,000,000 people, each with approximately 50 friends, the
>>>>>> results strongly suggest that graph databases are the best choice for
>>>>>> connected data. And graph database can still work 150 times faster than
>>>>>> relational database at third degree and 1000 times faster at fourth degre
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> I fail to see how this is connected to your attempt to list 1M
>>>>>> users in one go at the first page. You would want to seek if there is a
>>>>>> relationship and return that path between users. You need two start nodes
>>>>>> and seek a path by traveser the relationsip rather than scan tables and
>>>>>> that would be the comparison.
>>>>>> >>> Otherwise if you really need to present that many "things" just
>>>>>> paging the result with SKIP,LIMIT. I has never made sense to present 1M 
>>>>>> of
>>>>>> anything at a time for a user. Again, that wouldn't really serve your
>>>>>> experiment much good to prove graph theory.
>>>>>> >>>
>>>>>> >>> What is the result of MATCH(U:User) RETURN count(U); ?
>>>>>> >>>
>>>>>> >>> Also when you do your test make sure to add the warm/cold cache
>>>>>> effect (better/worse performance)
>>>>>> >>>
>>>>>> >>> Den torsdagen den 27:e mars 2014 kl. 17:57:10 UTC+1 skrev Rio
>>>>>> Eduardo:
>>>>>> >>>>
>>>>>> >>>> I just knew about memory allocation and just read Server
>>>>>> Performance Tuning of Neo4j. neo4j.properties:
>>>>>> >>>> # Default values for the low-level graph engine
>>>>>> >>>>
>>>>>> >>>> #neostore.nodestore.db.mapped_memory=25M
>>>>>> >>>> #neostore.relationshipstore.db.mapped_memory=50M
>>>>>> >>>> #neostore.propertystore.db.mapped_memory=90M
>>>>>> >>>> #neostore.propertystore.db.strings.mapped_memory=130M
>>>>>> >>>> #neostore.propertystore.db.arrays.mapped_memory=130M
>>>>>> >>>>
>>>>>> >>>> Should I change this to get high performance? If yes, please
>>>>>> suggest me.
>>>>>> >>>>
>>>>>> >>>> And I just knew about Neo4j Licenses, they are Community,
>>>>>> Personal, Startups, Business and Enterprise. And at Neo4j website all
>>>>>> features are explained. So which Neo4j should I use for my case that has
>>>>>> millions nodes and relationships?
>>>>>> >>>>
>>>>>> >>>> Please answer. I need your help so much.
>>>>>> >>>>
>>>>>> >>>> Thanks.
>>>>>> >>>>
>>>>>> >>>> On Tuesday, March 25, 2014 12:03:58 AM UTC+7, Rio Eduardo wrote:
>>>>>> >>>>>
>>>>>> >>>>> I'm testing my thesis which is about transforming from
>>>>>> relational database to graph database. After transforming from relational
>>>>>> database to graph database, I will test their own performance according 
>>>>>> to
>>>>>> query response time and throughput. In relational database, I use MySQL
>>>>>> while in graph database I use Neo4j for testing. I will have 3 Million 
>>>>>> more
>>>>>> nodes and 6 Million more relationships. But when I just added 60000 
>>>>>> nodes,
>>>>>> my Neo4j is already dead. When I tried to return all 60000 nodes, it
>>>>>> returned unknown. I did the same to MySQL, I added 60000 records but it
>>>>>> could return all 60000 records. It's weird because it's against the 
>>>>>> papers
>>>>>> I read that told me graph database is faster than relational database So
>>>>>> Why is Neo4j slower(totally dead) in lower specification of pc/notebook
>>>>>> while MySQL is not? And What specification of pc/notebook do I should use
>>>>>> to give the best performance during testing with millions of nodes and
>>>>>> relationships?
>>>>>> >>>>>
>>>>>> >>>>> Thank you.
>>>>>> >
>>>>>> > --
>>>>>> > You received this message because you are subscribed to the Google
>>>>>> Groups "Neo4j" group.
>>>>>> > To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>>
>>>>>> > For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: Why is Neo4j slower(totally dead) with many nodes and relationships in lower specification of pc/notebook while MySQL is not?

Reply via email to