Re: [Neo4j] Understanding effect of LIMIT on planning

Jesse Hemingway Sat, 05 Mar 2016 14:14:13 -0800

Wow, there are a lot of ways to crack this nut. Thanks again for the 
thought-provoking options. For a moment, I thought I was getting the hang 
of Cypher, there is obviously I've only scratched the surface.


In the meantime, I'm back to square 1 on the first query, as I realized my 
final solution, while having 45% fewer hits, no longer returns posts in 
descending order. I think I better re-read everything I've read so far 
about Cypher!

To your point about generating the string rather than passing a parameter, 
this will eventually be used with a highly available API server, and my 
understanding was that parametric plans could be cached and reused on 
subsequent queries. My limit value may bounce around a lot depending on 
circumstances.

On Saturday, March 5, 2016 at 3:21:57 PM UTC-6, Michael Hunger wrote:
>
> If the limit a parameter you could in those cases generate the string for 
> the statement and replace the numeric value.
>
> Or you could use WHERE size(rels) < {limit} but I would go for creating 
> the query string.
>
> the second query was not academic (my suggestions usually are not :) but a 
> *potentially* efficient query. 
> It finds the single longest path up to a limit, discarding the others and 
> using then the nodes of the path instead.
>
> For your other question. I'd try (presuming Category(uuid) is indexed:
> either
>
> PROFILE MATCH (r:Reference {id: 
> "latest"})-[:REFERS_TO]->(:Post)-[:PREVIOUS*0..9999]->(p)-[:MEMBER_OF]->(c:Category)
> USING INDEX r:Reference(id)
> USING INDEX c:Category(uuid)
> WHERE c.uuid IN ['b582bd70-e16c-11e5-8722-fef6f6021414', 
> 'b583cee0-e16c-11e5-8722-fef6f6021414']
> WITH p, c LIMIT 20
> RETURN p, COLLECT(c.uuid) as c
>
> OR
>
> MATCH (c:Category) WHERE c.uuid IN ['b582bd70-e16c-11e5-8722-fef6f6021414', 
> 'b583cee0-e16c-11e5-8722-fef6f6021414']
> PROFILE MATCH (r:Reference {id: "latest"})-[:REFERS_TO]->(:
> Post)-[:PREVIOUS*0..100]->(p)
> WHERE (p)-[:MEMBER_OF]->(c)
> WITH p, c LIMIT 20
> RETURN p, COLLECT(c.uuid) as c
>
> or 
>
> MATCH (c:Category) WHERE c.uuid IN ['b582bd70-e16c-11e5-8722-fef6f6021414', 
> 'b583cee0-e16c-11e5-8722-fef6f6021414']
> WITH collect(c) as categories
> PROFILE MATCH (r:Reference {id: "latest"})-[:REFERS_TO]->(:
> Post)-[:PREVIOUS*0..100]->(p)
> WITH p, [c in categories WHERE (p)-[:MEMBER_OF]->(c) | c.uuid] as 
> categories where size(categories) > 0
> RETURN p, categories 
> limit 20
>
>
>
> On Sat, Mar 5, 2016 at 6:17 PM, Jesse Hemingway <[email protected] 
> <javascript:>> wrote:
>
>> Thank you Michael, 
>>
>> This is great insight, and helps me better understand query structure. As 
>> I need the limit to be a parameter, I think it is not possible to use path 
>> length as the limiter. It also turns out not every post is categorized, so 
>> I found this variation on your suggestion to be the one that worked best, 
>> and cut DB hits by 45%!
>>
>> // Get latest posts and their categories
>> PROFILE MATCH (r:Reference {id: 
>> "latest"})-[:REFERS_TO]->(:Post)-[:PREVIOUS*0..]->(p)
>> WITH p LIMIT 20
>> OPTIONAL MATCH (p)-[:MEMBER_OF]->(c:Category)
>> RETURN p, COLLECT(c.uuid) as c
>>
>> Your second suggestion, collecting, sorting and unwinding the paths is 
>> complicated and harder to maintain/explain. Did you suggest this as an 
>> academic alternative, or potentially more performant (if I didn't have the 
>> parameter requirement)?
>>
>> Next, I sometimes need to get just the recent posts belonging to any of a 
>> set of categories (which will be a parameter as well). Could you comment on 
>> whether the following query could be further optimized? It takes over 7x 
>> more db hits than the simpler query (@ limit 20, and may be non-linear as 
>> limit grows, not sure why):
>>
>> // Get latest posts in 2 categories
>> PROFILE MATCH (r:Reference {id: 
>> "latest"})-[:REFERS_TO]->(:Post)-[:PREVIOUS*0..9999]->(p)-[:MEMBER_OF]->(c:Category)
>> WHERE c.uuid IN ['b582bd70-e16c-11e5-8722-fef6f6021414', 
>> 'b583cee0-e16c-11e5-8722-fef6f6021414']
>> WITH p, c LIMIT 20
>> RETURN p, COLLECT(c.uuid) as c
>>
>> Regards, and thanks again.
>>
>>
>>
>>
>>
>> On Saturday, March 5, 2016 at 4:40:18 AM UTC-6, Michael Hunger wrote:
>>>
>>> Hi Jesse,
>>>
>>> your first query has to aggregate over all the p's first to get the 
>>> cagetories (you never know if the last p is the same as a previous one in 
>>> the result set and has yet another category that you also have to collect 
>>> for the post)
>>>
>>> yours query just takes the first 50 pairs of posts and categories and 
>>> then groups them
>>>
>>> you could also do this which should be more efficient.
>>>
>>> PROFILE MATCH (r:Reference {id: "latest"})-->(:Post)-[:PREVIOUS*0..]->(p)
>>> WITH p LIMIT 50
>>> MATCH (p)-[:MEMBER_OF]->(c:Category)
>>> RETURN p, COLLECT(c.uuid) AS c
>>>
>>> Also if you're interested in only the latest 50 you can also just limit 
>>> the path length
>>>
>>> PROFILE MATCH (r:Reference {id: 
>>> "latest"})-->(:Post)-[:PREVIOUS*0..50]->(p)
>>> WITH p LIMIT 50
>>> MATCH (p)-[:MEMBER_OF]->(c:Category)
>>> RETURN p, COLLECT(c.uuid) AS c
>>>
>>> also add the relationship-type between Reference and post, so that the 
>>> planner can do a better job
>>>
>>> another thing you can try is to just get the longest path and then take 
>>> all the nodes of the path.
>>>
>>> PROFILE MATCH (r:Reference {id: "latest"})-->(first:Post)
>>> MATCH path = (first)-[rels:PREVIOUS*0..50]->(p)
>>> WITH path
>>> ORDER BY size(rels) DESC LIMIT 1
>>> UNWIND nodes(path) as p
>>> MATCH (p)-[:MEMBER_OF]->(c:Category)
>>> RETURN p, COLLECT(c.uuid) AS c
>>>
>>> HTH Michael
>>>
>>> On Fri, Mar 4, 2016 at 10:26 PM, Jesse Hemingway <[email protected]> 
>>> wrote:
>>>
>>>> I'm pretty new to neo4j, and I'm working on optimizing my queries. For 
>>>> my example, let's say we have a lot of 'Post' nodes that can grow in an 
>>>> unbounded fashion. I'm using a linked-list to be able to scan back through 
>>>> recent posts without consulting every node. This works, but my question 
>>>> has 
>>>> to do with a minor semantic change that means the difference between 
>>>> consulting LIMIT nodes, or ALL nodes. First, the bad one; this query 
>>>> consults all Post nodes:
>>>>
>>>> MATCH (r:Reference {id: 
>>>> "latest"})-->(:Post)-[:PREVIOUS*0..]->(p)-[:MEMBER_OF]->(c:Category)
>>>> RETURN p, COLLECT(c.uuid) AS c LIMIT 50
>>>>
>>>> However, this next query only consults only the last 50, as I'd like. 
>>>> Could someone explain why putting the LIMIT on the WITH makes all the 
>>>> difference?
>>>>
>>>> PROFILE MATCH (r:Reference {id: 
>>>> "latest"})-->(:Post)-[:PREVIOUS*0..]->(p)-[:MEMBER_OF]->(c:Category)
>>>> WITH p, c LIMIT 50
>>>> RETURN p, COLLECT(c.uuid) AS c
>>>>
>>>> Thanks in advance for any insight!
>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Understanding effect of LIMIT on planning

Reply via email to