Re: [Neo4j] Understanding effect of LIMIT on planning

Jesse Hemingway Sat, 05 Mar 2016 14:27:06 -0800

And specifying those indices explicitly cut the db hits in half! I had read 
your Performance Tuning article and had put that little nugget on the back 
burner, not realizing how critical it could be!


On Saturday, March 5, 2016 at 4:13:23 PM UTC-6, Jesse Hemingway wrote:
>
> Wow, there are a lot of ways to crack this nut. Thanks again for the 
> thought-provoking options. For a moment, I thought I was getting the hang 
> of Cypher, there is obviously I've only scratched the surface.
>
> In the meantime, I'm back to square 1 on the first query, as I realized my 
> final solution, while having 45% fewer hits, no longer returns posts in 
> descending order. I think I better re-read everything I've read so far 
> about Cypher!
>
> To your point about generating the string rather than passing a parameter, 
> this will eventually be used with a highly available API server, and my 
> understanding was that parametric plans could be cached and reused on 
> subsequent queries. My limit value may bounce around a lot depending on 
> circumstances.
>
> On Saturday, March 5, 2016 at 3:21:57 PM UTC-6, Michael Hunger wrote:
>>
>> If the limit a parameter you could in those cases generate the string for 
>> the statement and replace the numeric value.
>>
>> Or you could use WHERE size(rels) < {limit} but I would go for creating 
>> the query string.
>>
>> the second query was not academic (my suggestions usually are not :) but 
>> a *potentially* efficient query. 
>> It finds the single longest path up to a limit, discarding the others and 
>> using then the nodes of the path instead.
>>
>> For your other question. I'd try (presuming Category(uuid) is indexed:
>> either
>>
>> PROFILE MATCH (r:Reference {id: 
>> "latest"})-[:REFERS_TO]->(:Post)-[:PREVIOUS*0..9999]->(p)-[:MEMBER_OF]->(c:Category)
>> USING INDEX r:Reference(id)
>> USING INDEX c:Category(uuid)
>> WHERE c.uuid IN ['b582bd70-e16c-11e5-8722-fef6f6021414', 
>> 'b583cee0-e16c-11e5-8722-fef6f6021414']
>> WITH p, c LIMIT 20
>> RETURN p, COLLECT(c.uuid) as c
>>
>> OR
>>
>> MATCH (c:Category) WHERE c.uuid IN ['b582bd70-e16c-11e5-8722-fef6f6021414', 
>> 'b583cee0-e16c-11e5-8722-fef6f6021414']
>> PROFILE MATCH (r:Reference {id: "latest"})-[:REFERS_TO]->(:
>> Post)-[:PREVIOUS*0..100]->(p)
>> WHERE (p)-[:MEMBER_OF]->(c)
>> WITH p, c LIMIT 20
>> RETURN p, COLLECT(c.uuid) as c
>>
>> or 
>>
>> MATCH (c:Category) WHERE c.uuid IN ['b582bd70-e16c-11e5-8722-fef6f6021414', 
>> 'b583cee0-e16c-11e5-8722-fef6f6021414']
>> WITH collect(c) as categories
>> PROFILE MATCH (r:Reference {id: "latest"})-[:REFERS_TO]->(:
>> Post)-[:PREVIOUS*0..100]->(p)
>> WITH p, [c in categories WHERE (p)-[:MEMBER_OF]->(c) | c.uuid] as 
>> categories where size(categories) > 0
>> RETURN p, categories 
>> limit 20
>>
>>
>>
>> On Sat, Mar 5, 2016 at 6:17 PM, Jesse Hemingway <[email protected]> 
>> wrote:
>>
>>> Thank you Michael, 
>>>
>>> This is great insight, and helps me better understand query structure. 
>>> As I need the limit to be a parameter, I think it is not possible to use 
>>> path length as the limiter. It also turns out not every post is 
>>> categorized, so I found this variation on your suggestion to be the one 
>>> that worked best, and cut DB hits by 45%!
>>>
>>> // Get latest posts and their categories
>>> PROFILE MATCH (r:Reference {id: 
>>> "latest"})-[:REFERS_TO]->(:Post)-[:PREVIOUS*0..]->(p)
>>> WITH p LIMIT 20
>>> OPTIONAL MATCH (p)-[:MEMBER_OF]->(c:Category)
>>> RETURN p, COLLECT(c.uuid) as c
>>>
>>> Your second suggestion, collecting, sorting and unwinding the paths is 
>>> complicated and harder to maintain/explain. Did you suggest this as an 
>>> academic alternative, or potentially more performant (if I didn't have the 
>>> parameter requirement)?
>>>
>>> Next, I sometimes need to get just the recent posts belonging to any of 
>>> a set of categories (which will be a parameter as well). Could you comment 
>>> on whether the following query could be further optimized? It takes over 7x 
>>> more db hits than the simpler query (@ limit 20, and may be non-linear as 
>>> limit grows, not sure why):
>>>
>>> // Get latest posts in 2 categories
>>> PROFILE MATCH (r:Reference {id: 
>>> "latest"})-[:REFERS_TO]->(:Post)-[:PREVIOUS*0..9999]->(p)-[:MEMBER_OF]->(c:Category)
>>> WHERE c.uuid IN ['b582bd70-e16c-11e5-8722-fef6f6021414', 
>>> 'b583cee0-e16c-11e5-8722-fef6f6021414']
>>> WITH p, c LIMIT 20
>>> RETURN p, COLLECT(c.uuid) as c
>>>
>>> Regards, and thanks again.
>>>
>>>
>>>
>>>
>>>
>>> On Saturday, March 5, 2016 at 4:40:18 AM UTC-6, Michael Hunger wrote:
>>>>
>>>> Hi Jesse,
>>>>
>>>> your first query has to aggregate over all the p's first to get the 
>>>> cagetories (you never know if the last p is the same as a previous one in 
>>>> the result set and has yet another category that you also have to collect 
>>>> for the post)
>>>>
>>>> yours query just takes the first 50 pairs of posts and categories and 
>>>> then groups them
>>>>
>>>> you could also do this which should be more efficient.
>>>>
>>>> PROFILE MATCH (r:Reference {id: 
>>>> "latest"})-->(:Post)-[:PREVIOUS*0..]->(p)
>>>> WITH p LIMIT 50
>>>> MATCH (p)-[:MEMBER_OF]->(c:Category)
>>>> RETURN p, COLLECT(c.uuid) AS c
>>>>
>>>> Also if you're interested in only the latest 50 you can also just limit 
>>>> the path length
>>>>
>>>> PROFILE MATCH (r:Reference {id: 
>>>> "latest"})-->(:Post)-[:PREVIOUS*0..50]->(p)
>>>> WITH p LIMIT 50
>>>> MATCH (p)-[:MEMBER_OF]->(c:Category)
>>>> RETURN p, COLLECT(c.uuid) AS c
>>>>
>>>> also add the relationship-type between Reference and post, so that the 
>>>> planner can do a better job
>>>>
>>>> another thing you can try is to just get the longest path and then take 
>>>> all the nodes of the path.
>>>>
>>>> PROFILE MATCH (r:Reference {id: "latest"})-->(first:Post)
>>>> MATCH path = (first)-[rels:PREVIOUS*0..50]->(p)
>>>> WITH path
>>>> ORDER BY size(rels) DESC LIMIT 1
>>>> UNWIND nodes(path) as p
>>>> MATCH (p)-[:MEMBER_OF]->(c:Category)
>>>> RETURN p, COLLECT(c.uuid) AS c
>>>>
>>>> HTH Michael
>>>>
>>>> On Fri, Mar 4, 2016 at 10:26 PM, Jesse Hemingway <[email protected]> 
>>>> wrote:
>>>>
>>>>> I'm pretty new to neo4j, and I'm working on optimizing my queries. For 
>>>>> my example, let's say we have a lot of 'Post' nodes that can grow in an 
>>>>> unbounded fashion. I'm using a linked-list to be able to scan back 
>>>>> through 
>>>>> recent posts without consulting every node. This works, but my question 
>>>>> has 
>>>>> to do with a minor semantic change that means the difference between 
>>>>> consulting LIMIT nodes, or ALL nodes. First, the bad one; this query 
>>>>> consults all Post nodes:
>>>>>
>>>>> MATCH (r:Reference {id: 
>>>>> "latest"})-->(:Post)-[:PREVIOUS*0..]->(p)-[:MEMBER_OF]->(c:Category)
>>>>> RETURN p, COLLECT(c.uuid) AS c LIMIT 50
>>>>>
>>>>> However, this next query only consults only the last 50, as I'd like. 
>>>>> Could someone explain why putting the LIMIT on the WITH makes all the 
>>>>> difference?
>>>>>
>>>>> PROFILE MATCH (r:Reference {id: 
>>>>> "latest"})-->(:Post)-[:PREVIOUS*0..]->(p)-[:MEMBER_OF]->(c:Category)
>>>>> WITH p, c LIMIT 50
>>>>> RETURN p, COLLECT(c.uuid) AS c
>>>>>
>>>>> Thanks in advance for any insight!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Understanding effect of LIMIT on planning

Reply via email to