You should be able to test your queries yourself, with a small dataset to look at their results and reason about them.
And performance wise, if they are fast enough it's good too. On Mon, Mar 31, 2014 at 4:46 PM, Rio Eduardo <[email protected]> wrote: > Hi Michael, > you mean don't use dataset that doesn't make sense? > > Please help me just by checking my last two queries. I'm testing my thesis > until depth of five only. > > and is there no others way to speed up the traversal in cypher? maybe > shortestPath? is the only way using traversal-API? > > Please help me, I have to graduate this year. > Thank you. > > > On Monday, March 31, 2014 8:09:32 PM UTC+7, Michael Hunger wrote: > >> Just use a dataset that you can reason about and check if they work >> correctly. >> >> Hard for me to be the consistency checker on your queries :) >> >> In general if you really want to do these deep traversals you might be >> better off (in terms of performance) using the traversal-API with an >> appropriate uniqueness constraint, like node-path. >> >> >> >> >> On Mon, Mar 31, 2014 at 1:09 PM, Rio Eduardo <[email protected]> wrote: >> >>> Hello again Michael. >>> >>> I just want to make sure that my query is correct to find friends of >>> friends at depth of four and five. Please help me by checking my query. >>> >>> Query at depth of four: >>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) >>> WHERE U.user_id=1 >>> WITH DISTINCT U, FU, FFU >>> WHERE FFU<>U >>> WITH DISTINCT U, FU, FFU >>> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User) >>> WHERE FFFU<>FU >>> WITH DISTINCT U, FFU, FFFU >>> MATCH (FFFU:User)-[FFFF:Friend]->(FFFFU:User) >>> WHERE FFFFU<>FFU AND FFFFU<>U AND NOT (U)-[:Friend]->(FFFFU) >>> RETURN DISTINCT FFFFU.username; >>> >>> Query at depth of five: >>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) >>> WHERE U.user_id=1 >>> WITH DISTINCT U, FU, FFU >>> WHERE FFU<>U >>> WITH DISTINCT U, FU, FFU >>> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User) >>> WHERE FFFU<>FU >>> WITH DISTINCT U, FFU, FFFU >>> MATCH (FFFU:User)-[FFFF:Friend]->(FFFFU:User) >>> WHERE FFFFU<>FFU >>> WITH DISTINCT U, FFFU, FFFFU >>> MATCH (FFFFU:User)-[FFFFF:Friend]->(FFFFFU:User) >>> WHERE FFFFFU<>FFFU AND FFFFFU<>U AND NOT (U)-[:Friend]->(FFFFFU) >>> RETURN DISTINCT FFFFFU.username; >>> >>> I need your help so much. >>> Thank you. >>> >>> >>> On Sunday, March 30, 2014 7:42:27 PM UTC+7, Michael Hunger wrote: >>> >>>> Split it up in one more intermediate step, the intermediate steps are >>>> there to get the cardinality down, so it doesn't have to match billions of >>>> paths, only millions or 100k >>>> >>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF: >>>> Friend]->(FFFU:User) >>>> WHERE U.user_id=1 >>>> WITH DISTINCT U, FU, FFU >>>> WHERE FFU<>U >>>> WITH DISTINCT U, FFU >>>> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User) >>>> WHERE NOT (U)-[:Friend]->(FFFU) >>>> RETURN distinct FFFU.username; >>>> >>>> >>>> >>>> >>>> On Sun, Mar 30, 2014 at 1:29 PM, Rio Eduardo <[email protected]>wrote: >>>> >>>>> Please help me again Michael. >>>>> >>>>> You ever said: >>>>> >>>>> I would also change: >>>>> >>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) >>>>> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT >>>>> (U)-[:Friend]->(FFU) >>>>> RETURN FFU.username >>>>> >>>>> to >>>>> >>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) >>>>> WHERE U.user_id=1 >>>>> WITH distinct U, FFU >>>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU) >>>>> RETURN FFU.username >>>>> >>>>> Query above is to find friends of friends at depth of two. And I would >>>>> like to find friends of friends at depth of three, when I use model of >>>>> your query, it returns result longer than mine and the result is much more >>>>> than mine. Ok so here is model of your query at depth of three: >>>>> >>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF: >>>>> Friend]->(FFFU:User) >>>>> WHERE U.user_id=1 >>>>> WITH DISTINCT U, FU, FFU, FFFU >>>>> WHERE FFU<>U AND FFFU<>FU AND NOT (U)-[:Friend]->(FFFU) >>>>> RETURN FFFU.username; >>>>> >>>>> ... >>>>> >>>>> 118858 rows >>>>> 20090 ms >>>>> >>>>> Mine: >>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF: >>>>> Friend]->(FFFU:User) >>>>> WHERE U.user_id=1 AND FFU<>U AND FFFU<>FU AND NOT (U)-[:Friend]->(FFFU) >>>>> RETURN DISTINCT FFFU.username; >>>>> >>>>> ... >>>>> >>>>> 950 rows >>>>> 18133 ms >>>>> >>>>> Please help me, Why is model of your query longer than mine and return >>>>> much more results than mine? >>>>> >>>>> Thank you. >>>>> >>>>> >>>>> >>>>> On Friday, March 28, 2014 8:30:20 PM UTC+7, Michael Hunger wrote: >>>>> >>>>>> Rio, >>>>>> >>>>>> was this your first run of both statements? If so, please run them >>>>>> for a second time. >>>>>> And did you create an index or constraint for :User(user_id) ? >>>>>> >>>>>> MATCH (U:User) RETURN COUNT(U); >>>>>> >>>>>> I would also change: >>>>>> >>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) >>>>>> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT >>>>>> (U)-[:Friend]->(FFU) >>>>>> RETURN FFU.username >>>>>> >>>>>> to >>>>>> >>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) >>>>>> WHERE U.user_id=1 >>>>>> WITH distinct U, FFU >>>>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU) >>>>>> RETURN FFU.username >>>>>> >>>>>> I quickly created a dataset on my machine: >>>>>> >>>>>> cypher 2.0 foreach (i in range(1,1000) | create (:User {id:i})); >>>>>> >>>>>> create constraint on (u:User) assert u.id is unique; >>>>>> >>>>>> match (u1:User),(u2:User) with u1,u2 where rand() < 0.1 create >>>>>> (u1)-[:Friend]->(u2); >>>>>> >>>>>> Relationships created: 99974 >>>>>> >>>>>> 778 ms >>>>>> >>>>>> match (u:User) return count(*); >>>>>> >>>>>> +----------+ >>>>>> | count(*) | >>>>>> +----------+ >>>>>> | 1000 | >>>>>> +----------+ >>>>>> 1 row >>>>>> *4 ms* >>>>>> >>>>>> >>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) >>>>>> WHERE U.id=1 >>>>>> WITH distinct U, FFU >>>>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU) >>>>>> RETURN FFU.id; >>>>>> >>>>>> ... >>>>>> >>>>>> 910 rows >>>>>> >>>>>> 101 ms >>>>>> >>>>>> but even your query takes only >>>>>> >>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) >>>>>> WHERE U.id=1 AND FFU.id<>U.id AND NOT (U)-[:Friend]->(FFU) >>>>>> RETURN FFU.id; >>>>>> >>>>>> ... >>>>>> >>>>>> 8188 rows >>>>>> >>>>>> 578 ms >>>>>> >>>>>> >>>>>> On Fri, Mar 28, 2014 at 2:08 PM, Lundin <[email protected]> wrote: >>>>>> > >>>>>> > ms, it is milliseconds. >>>>>> > >>>>>> > What is the corresponding result for a SQL db ? >>>>>> > MATCH (n:User)-[:Friend*3]-(FoFoF) return FoFoF; >>>>>> > >>>>>> > Albeit a valid search is it something useful ? I would think >>>>>> finding a specific persons FoFoF in either end, as a starting point or >>>>>> end >>>>>> point, would be a very realistic scenario. Adding an Index on User:name >>>>>> and >>>>>> query for a User with name:Rio try to find his FoFoF. >>>>>> > >>>>>> > Yes, neo4j has been kind and exposed various function, like >>>>>> shortestpath in cypher >>>>>> > http://docs.neo4j.org/refcard/2.0/ >>>>>> > >>>>>> > Also look at some gist examples >>>>>> > https://github.com/neo4j-contrib/graphgist/wiki >>>>>> > >>>>>> > Den fredagen den 28:e mars 2014 kl. 05:00:22 UTC+1 skrev Rio >>>>>> Eduardo: >>>>>> >> >>>>>> >> Thank you so much for the reply Lundin. I really apreciate it. >>>>>> Okay, yesterday I just tested my experiment again. And the result was not >>>>>> what I imagined and expected before. Okay, before I tested 1M users, I >>>>>> reduced the number of users into 1000 users and tested it not in my >>>>>> social >>>>>> network but directly in database only(Neo4j Shell) to find out that it >>>>>> was >>>>>> not caused by the performance of pc. But the result of returning 1000 >>>>>> users >>>>>> was 200ms and 1 row and the result of returning friends at depth of two >>>>>> was >>>>>> 85000ms and 2500 rows and are 200ms and 85000ms fast to you? and what >>>>>> does >>>>>> ms stand for? is it milliseconds or microseconds? >>>>>> >> >>>>>> >> the query I use for returning 1000 users is >>>>>> >> MATCH (U:User) RETURN COUNT(U); >>>>>> >> >>>>>> >> and the query I use for returning friends at depth of two is >>>>>> >> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) >>>>>> >> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT >>>>>> (U)-[:Friend]->(FFU) >>>>>> >> RETURN FFU.username >>>>>> >> >>>>>> >> Please note that I tested with default configuration of Neo4j and >>>>>> created users with 1000 random nodes and created friends relationships >>>>>> with >>>>>> 50000 random relationships(1 user has 50 friends). Each relationship has >>>>>> a >>>>>> label Friend and no properties on it. Each node has a label User, 4 >>>>>> properties: user_id, username, password and profile_picture. Each >>>>>> property >>>>>> has a value of 1-60 characters. average of characters of user_id=1-1000 >>>>>> characters, all usernames have 10 characters randomly, all passwords have >>>>>> 60 characters because I MD5 it, and profile_picture has 1-60 characters. >>>>>> >> >>>>>> >> And about your statement "Otherwise if you really need to present >>>>>> that many "things" just paging the result with SKIP,LIMIT. I has never >>>>>> made >>>>>> sense to present 1M of anything at a time for a user.", I already did >>>>>> according to your statement above but it is still the same, Neo4j returns >>>>>> result slower. >>>>>> >> >>>>>> >> And I'm wondering if Neo4j already applied one of graph >>>>>> algorithms(shortest path, djikstra, A*, etc) in its system or not. >>>>>> >> >>>>>> >> Thank you. >>>>>> >> >>>>>> >> >>>>>> >> On Friday, March 28, 2014 3:43:49 AM UTC+7, Lundin wrote: >>>>>> >>> >>>>>> >>> Rio, any version will do. They can all handle million nodes on >>>>>> common hardware, no magic at all. When hundred of millions of billions >>>>>> then >>>>>> we might need to look into specfication more in detail. But in that case >>>>>> with that kind of data there are other bottlencks for a social network or >>>>>> any web appp that needs to be taken care of as well. >>>>>> >>> >>>>>> >>> you said: >>>>>> >>>> >>>>>> >>>> Given any two persons chosen at random, is there a path that >>>>>> connects them that is at most five relationships long? For a social >>>>>> network >>>>>> containing 1,000,000 people, each with approximately 50 friends, the >>>>>> results strongly suggest that graph databases are the best choice for >>>>>> connected data. And graph database can still work 150 times faster than >>>>>> relational database at third degree and 1000 times faster at fourth degre >>>>>> >>> >>>>>> >>> >>>>>> >>> I fail to see how this is connected to your attempt to list 1M >>>>>> users in one go at the first page. You would want to seek if there is a >>>>>> relationship and return that path between users. You need two start nodes >>>>>> and seek a path by traveser the relationsip rather than scan tables and >>>>>> that would be the comparison. >>>>>> >>> Otherwise if you really need to present that many "things" just >>>>>> paging the result with SKIP,LIMIT. I has never made sense to present 1M >>>>>> of >>>>>> anything at a time for a user. Again, that wouldn't really serve your >>>>>> experiment much good to prove graph theory. >>>>>> >>> >>>>>> >>> What is the result of MATCH(U:User) RETURN count(U); ? >>>>>> >>> >>>>>> >>> Also when you do your test make sure to add the warm/cold cache >>>>>> effect (better/worse performance) >>>>>> >>> >>>>>> >>> Den torsdagen den 27:e mars 2014 kl. 17:57:10 UTC+1 skrev Rio >>>>>> Eduardo: >>>>>> >>>> >>>>>> >>>> I just knew about memory allocation and just read Server >>>>>> Performance Tuning of Neo4j. neo4j.properties: >>>>>> >>>> # Default values for the low-level graph engine >>>>>> >>>> >>>>>> >>>> #neostore.nodestore.db.mapped_memory=25M >>>>>> >>>> #neostore.relationshipstore.db.mapped_memory=50M >>>>>> >>>> #neostore.propertystore.db.mapped_memory=90M >>>>>> >>>> #neostore.propertystore.db.strings.mapped_memory=130M >>>>>> >>>> #neostore.propertystore.db.arrays.mapped_memory=130M >>>>>> >>>> >>>>>> >>>> Should I change this to get high performance? If yes, please >>>>>> suggest me. >>>>>> >>>> >>>>>> >>>> And I just knew about Neo4j Licenses, they are Community, >>>>>> Personal, Startups, Business and Enterprise. And at Neo4j website all >>>>>> features are explained. So which Neo4j should I use for my case that has >>>>>> millions nodes and relationships? >>>>>> >>>> >>>>>> >>>> Please answer. I need your help so much. >>>>>> >>>> >>>>>> >>>> Thanks. >>>>>> >>>> >>>>>> >>>> On Tuesday, March 25, 2014 12:03:58 AM UTC+7, Rio Eduardo wrote: >>>>>> >>>>> >>>>>> >>>>> I'm testing my thesis which is about transforming from >>>>>> relational database to graph database. After transforming from relational >>>>>> database to graph database, I will test their own performance according >>>>>> to >>>>>> query response time and throughput. In relational database, I use MySQL >>>>>> while in graph database I use Neo4j for testing. I will have 3 Million >>>>>> more >>>>>> nodes and 6 Million more relationships. But when I just added 60000 >>>>>> nodes, >>>>>> my Neo4j is already dead. When I tried to return all 60000 nodes, it >>>>>> returned unknown. I did the same to MySQL, I added 60000 records but it >>>>>> could return all 60000 records. It's weird because it's against the >>>>>> papers >>>>>> I read that told me graph database is faster than relational database So >>>>>> Why is Neo4j slower(totally dead) in lower specification of pc/notebook >>>>>> while MySQL is not? And What specification of pc/notebook do I should use >>>>>> to give the best performance during testing with millions of nodes and >>>>>> relationships? >>>>>> >>>>> >>>>>> >>>>> Thank you. >>>>>> > >>>>>> > -- >>>>>> > You received this message because you are subscribed to the Google >>>>>> Groups "Neo4j" group. >>>>>> > To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> >>>>>> > For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Neo4j" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
