Re: [Neo4j] How does Neo4j store intermediate results for MATCH queries?

chathura kankanamge Wed, 04 Jan 2017 15:08:08 -0800

Sorry about the mixup with the profile and query. The profile had the query 
(reproduced below) which was giving memory issues.


MATCH (a:User)-[:TRUSTS]->(b:User)-[:TRUSTS]->(c:User), 
(a)-[:TRUSTS]->(d:User)-[:TRUSTS]->(c)
WITH count(*) AS cnt
RETURN cnt;


The database is available at 
https://drive.google.com/file/d/0B4t4FZ7XDmELQjkxcW41NVdjdE0/view?usp=sharing.

Obviously I can't use the where filtering you suggested on this. It would be 
great if you could give a pointer about why this query uses so much memory.


PS:
The query I had posted by mistake is one which matches triangular patterns 
in the DB. I tried the change you suggested, but it was about 40% slower 
than the earlier version. I would have thought that the filtered version is 
faster than the cartesian product. Looking at the profiles, your version 
does an extra SemiApply operation. Maybe this is relatively expensive?

On Wednesday, January 4, 2017 at 11:31:50 AM UTC-5, Michael Hunger wrote:
>
> Can you try to use a WHERE condition instead.
>
> MATCH (a:User)-[:TRUSTS]->()-[:TRUSTS]->(c:User)WHERE (c)-[:TRUSTS]->(a)
>
> RETURN count(*) AS cnt;
>
> or
>
> MATCH (a:User)-[:TRUSTS]->()-[:TRUSTS]->(c:User)
> WITH a,c, count(*) as countWHERE (c)-[:TRUSTS]->(a)
>
> RETURN sum(count) AS cnt;
>
>
> Also the profile you shared uses a different query.
> Do you have the database accessible somewhere?
>
>
> On Wed, Jan 4, 2017 at 2:04 AM, chathura kankanamge <
> [email protected] <javascript:>> wrote:
>
>> I ran a query to match all 'diamond patterns' on the Epinions1 dataset 
>> <https://snap.stanford.edu/data/soc-Epinions1.html> (75877 nodes/ 508836 
>> edges) for an academic project and found that Neo4j needs >100g of heap 
>> during the evaluation. The query I used was,
>>
>>
>> MATCH (a:User)-[:TRUSTS]->(b:User)-[:TRUSTS]->(c:User), (c)-[:TRUSTS]->(a)
>> WITH count(*) AS cnt
>> RETURN cnt;
>>
>>
>>
>> <https://lh3.googleusercontent.com/-RQIVSMvrgFo/WGxGcafFogI/AAAAAAAAAio/98cmtz8j4fw6IyQswY6yi1cY3slJ_CxqQCLcB/s1600/neo4j_query_plan_epinions_diamond.png>
>>
>>
>> The count I got back was 286,371,276.
>>
>> The heap size seems too large given the number of intermediate results in 
>> the query plan estimates. 
>>
>> Can the memory usage be because the final results are being stored in memory 
>> too? How big are intermediate/ final record row objects on average?
>>
>>
>> Thanks,
>>
>> Chathura
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] How does Neo4j store intermediate results for MATCH queries?

Reply via email to