And specifying those indices explicitly cut the db hits in half! I had read
your Performance Tuning article and had put that little nugget on the back
burner, not realizing how critical it could be!
On Saturday, March 5, 2016 at 4:13:23 PM UTC-6, Jesse Hemingway wrote:
>
> Wow, there are a lot of ways to crack this nut. Thanks again for the
> thought-provoking options. For a moment, I thought I was getting the hang
> of Cypher, there is obviously I've only scratched the surface.
>
> In the meantime, I'm back to square 1 on the first query, as I realized my
> final solution, while having 45% fewer hits, no longer returns posts in
> descending order. I think I better re-read everything I've read so far
> about Cypher!
>
> To your point about generating the string rather than passing a parameter,
> this will eventually be used with a highly available API server, and my
> understanding was that parametric plans could be cached and reused on
> subsequent queries. My limit value may bounce around a lot depending on
> circumstances.
>
> On Saturday, March 5, 2016 at 3:21:57 PM UTC-6, Michael Hunger wrote:
>>
>> If the limit a parameter you could in those cases generate the string for
>> the statement and replace the numeric value.
>>
>> Or you could use WHERE size(rels) < {limit} but I would go for creating
>> the query string.
>>
>> the second query was not academic (my suggestions usually are not :) but
>> a *potentially* efficient query.
>> It finds the single longest path up to a limit, discarding the others and
>> using then the nodes of the path instead.
>>
>> For your other question. I'd try (presuming Category(uuid) is indexed:
>> either
>>
>> PROFILE MATCH (r:Reference {id:
>> "latest"})-[:REFERS_TO]->(:Post)-[:PREVIOUS*0..9999]->(p)-[:MEMBER_OF]->(c:Category)
>> USING INDEX r:Reference(id)
>> USING INDEX c:Category(uuid)
>> WHERE c.uuid IN ['b582bd70-e16c-11e5-8722-fef6f6021414',
>> 'b583cee0-e16c-11e5-8722-fef6f6021414']
>> WITH p, c LIMIT 20
>> RETURN p, COLLECT(c.uuid) as c
>>
>> OR
>>
>> MATCH (c:Category) WHERE c.uuid IN ['b582bd70-e16c-11e5-8722-fef6f6021414',
>> 'b583cee0-e16c-11e5-8722-fef6f6021414']
>> PROFILE MATCH (r:Reference {id: "latest"})-[:REFERS_TO]->(:
>> Post)-[:PREVIOUS*0..100]->(p)
>> WHERE (p)-[:MEMBER_OF]->(c)
>> WITH p, c LIMIT 20
>> RETURN p, COLLECT(c.uuid) as c
>>
>> or
>>
>> MATCH (c:Category) WHERE c.uuid IN ['b582bd70-e16c-11e5-8722-fef6f6021414',
>> 'b583cee0-e16c-11e5-8722-fef6f6021414']
>> WITH collect(c) as categories
>> PROFILE MATCH (r:Reference {id: "latest"})-[:REFERS_TO]->(:
>> Post)-[:PREVIOUS*0..100]->(p)
>> WITH p, [c in categories WHERE (p)-[:MEMBER_OF]->(c) | c.uuid] as
>> categories where size(categories) > 0
>> RETURN p, categories
>> limit 20
>>
>>
>>
>> On Sat, Mar 5, 2016 at 6:17 PM, Jesse Hemingway <[email protected]>
>> wrote:
>>
>>> Thank you Michael,
>>>
>>> This is great insight, and helps me better understand query structure.
>>> As I need the limit to be a parameter, I think it is not possible to use
>>> path length as the limiter. It also turns out not every post is
>>> categorized, so I found this variation on your suggestion to be the one
>>> that worked best, and cut DB hits by 45%!
>>>
>>> // Get latest posts and their categories
>>> PROFILE MATCH (r:Reference {id:
>>> "latest"})-[:REFERS_TO]->(:Post)-[:PREVIOUS*0..]->(p)
>>> WITH p LIMIT 20
>>> OPTIONAL MATCH (p)-[:MEMBER_OF]->(c:Category)
>>> RETURN p, COLLECT(c.uuid) as c
>>>
>>> Your second suggestion, collecting, sorting and unwinding the paths is
>>> complicated and harder to maintain/explain. Did you suggest this as an
>>> academic alternative, or potentially more performant (if I didn't have the
>>> parameter requirement)?
>>>
>>> Next, I sometimes need to get just the recent posts belonging to any of
>>> a set of categories (which will be a parameter as well). Could you comment
>>> on whether the following query could be further optimized? It takes over 7x
>>> more db hits than the simpler query (@ limit 20, and may be non-linear as
>>> limit grows, not sure why):
>>>
>>> // Get latest posts in 2 categories
>>> PROFILE MATCH (r:Reference {id:
>>> "latest"})-[:REFERS_TO]->(:Post)-[:PREVIOUS*0..9999]->(p)-[:MEMBER_OF]->(c:Category)
>>> WHERE c.uuid IN ['b582bd70-e16c-11e5-8722-fef6f6021414',
>>> 'b583cee0-e16c-11e5-8722-fef6f6021414']
>>> WITH p, c LIMIT 20
>>> RETURN p, COLLECT(c.uuid) as c
>>>
>>> Regards, and thanks again.
>>>
>>>
>>>
>>>
>>>
>>> On Saturday, March 5, 2016 at 4:40:18 AM UTC-6, Michael Hunger wrote:
>>>>
>>>> Hi Jesse,
>>>>
>>>> your first query has to aggregate over all the p's first to get the
>>>> cagetories (you never know if the last p is the same as a previous one in
>>>> the result set and has yet another category that you also have to collect
>>>> for the post)
>>>>
>>>> yours query just takes the first 50 pairs of posts and categories and
>>>> then groups them
>>>>
>>>> you could also do this which should be more efficient.
>>>>
>>>> PROFILE MATCH (r:Reference {id:
>>>> "latest"})-->(:Post)-[:PREVIOUS*0..]->(p)
>>>> WITH p LIMIT 50
>>>> MATCH (p)-[:MEMBER_OF]->(c:Category)
>>>> RETURN p, COLLECT(c.uuid) AS c
>>>>
>>>> Also if you're interested in only the latest 50 you can also just limit
>>>> the path length
>>>>
>>>> PROFILE MATCH (r:Reference {id:
>>>> "latest"})-->(:Post)-[:PREVIOUS*0..50]->(p)
>>>> WITH p LIMIT 50
>>>> MATCH (p)-[:MEMBER_OF]->(c:Category)
>>>> RETURN p, COLLECT(c.uuid) AS c
>>>>
>>>> also add the relationship-type between Reference and post, so that the
>>>> planner can do a better job
>>>>
>>>> another thing you can try is to just get the longest path and then take
>>>> all the nodes of the path.
>>>>
>>>> PROFILE MATCH (r:Reference {id: "latest"})-->(first:Post)
>>>> MATCH path = (first)-[rels:PREVIOUS*0..50]->(p)
>>>> WITH path
>>>> ORDER BY size(rels) DESC LIMIT 1
>>>> UNWIND nodes(path) as p
>>>> MATCH (p)-[:MEMBER_OF]->(c:Category)
>>>> RETURN p, COLLECT(c.uuid) AS c
>>>>
>>>> HTH Michael
>>>>
>>>> On Fri, Mar 4, 2016 at 10:26 PM, Jesse Hemingway <[email protected]>
>>>> wrote:
>>>>
>>>>> I'm pretty new to neo4j, and I'm working on optimizing my queries. For
>>>>> my example, let's say we have a lot of 'Post' nodes that can grow in an
>>>>> unbounded fashion. I'm using a linked-list to be able to scan back
>>>>> through
>>>>> recent posts without consulting every node. This works, but my question
>>>>> has
>>>>> to do with a minor semantic change that means the difference between
>>>>> consulting LIMIT nodes, or ALL nodes. First, the bad one; this query
>>>>> consults all Post nodes:
>>>>>
>>>>> MATCH (r:Reference {id:
>>>>> "latest"})-->(:Post)-[:PREVIOUS*0..]->(p)-[:MEMBER_OF]->(c:Category)
>>>>> RETURN p, COLLECT(c.uuid) AS c LIMIT 50
>>>>>
>>>>> However, this next query only consults only the last 50, as I'd like.
>>>>> Could someone explain why putting the LIMIT on the WITH makes all the
>>>>> difference?
>>>>>
>>>>> PROFILE MATCH (r:Reference {id:
>>>>> "latest"})-->(:Post)-[:PREVIOUS*0..]->(p)-[:MEMBER_OF]->(c:Category)
>>>>> WITH p, c LIMIT 50
>>>>> RETURN p, COLLECT(c.uuid) AS c
>>>>>
>>>>> Thanks in advance for any insight!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.