Re: [Neo4j] Slow cypher query in relatively small data set, and help writing a related query

Ziv Unger Tue, 28 Jan 2014 07:03:38 -0800

Completely missed your last line with the modified query!

I just ran that in a console and it returns around 33ms!


I tried having all 3 matches comma separated in one statement, which raised 
the time to over 700ms, and had it as a continuation of the cypher pattern 
from the first (p:Person), but never thought to just have two. I was 
working on the theory from the Webinar in which you said that multiple 
patterns should be separated out into separate MATCH statements. How  does 
one know what will make a difference like this, as in my mind I'd expect 
the results to be similar?

On Tuesday, 28 January 2014 14:17:08 UTC+2, Michael Hunger wrote:
>
>
> I think you already did a really good job.
>
> How did you measure? With browser? As Wes said in the webinar that's quite 
> unreliable. So either measure with neo4j-shell or with your own, 
> parametrized code.
>
> What kind of indexes/constraints do you have in your db?
> e.g. unique constraint on :Company(id) or :Region(name) ?
>
> You might drop label checks in places where due to the relationship-types 
> there is only one label possible anyway.
>
> Did you try to pull more of the pattern in a single match?
>
> Also probably sensible to look at the cardinalities in between and try to 
> reduce them if they are getting too high? (e.g. with WITH distinct c,p,
> regions.name, busrels.name
>
> How do individual parts of the pattern perform?
>
> match (c:Company {id: 
> 6})-[:operates_in_region]->(regions:Region)<-[:operates_in_region]-(p:Company),(c)-[:has_objective]->(:BusRel)-[:looks_for]->(busrels:BusRel)<-[:has_objective]-(p)
>
> match 
> (c)-[:operates_in_sector]->(sectors:Sector)-[:has_child*0..1]-(:Sector)<-[:operates_in_sector]-(p)
> return p.name, collect(distinct(busrels.name)), collect(distinct(
> sectors.name)), collect(distinct(regions.name)) 
>
> Any chance to get your db (privately) for testing?
>
> Am 28.01.2014 um 12:18 schrieb Ziv Unger <[email protected] <javascript:>
> >:
>
> Hi
>
> I have a relatively small database, consisting of the following elements, 
> each of which is defined as a Label:
>
> Company
> BusRel (business relationship)
> Region
> Sector
>
> Sectors are a parent/child hierarchy (max. 1 level deep), with a 
> [:has_child] relationship between a parent and children. (approximately 122 
> nodes total)
>
> Regions are also a parent/child hierarchy, but are present for example as: 
> (:Region {name: 'Africa'})-[:has_child]->(:Region {name: 'South 
> Africa'})-[:has_child]->(:Region {name: 'Gauteng'})-[:has_child]->(:Region 
> {name: 'Pretoria'}), ie. Continent all the way down to City with a 
> :has_child relationship from parent to child. (approximately 50000 nodes 
> total)
>
> BusRel are business relationships. There are matching pairs which are 
> linked to each other (bi-directional links), for example: {:BusRel {name: 
> 'Looking for a distributor'})<-[:looks_for]->(:BusRel {name: 'Looking to 
> distribute'})  (approximately 18 nodes total)
>
> Companies aren't linked to each other directly, but rather via one or more 
> of the Labels above. A company could have multiple BusRels and Sectors, but 
> only one Region (at present). (approximately 10000 nodes total)
>
> I've attached a screenshot from the web admin to illustrate an example, 
> the nodes being:
>
> Blue: Company
> Grey: Region
> Yellow: BusRel
> Orange: Sector
>
> I've written a query to start at a specified company (there is a unique 
> constraint on Company property "id") and retrieve all other Companies which 
> share (1) one or more corresponding business relationship (2) the same 
> region (3) one or more sectors. It looks like this:
>
> match (c:Company {id: 
> 6})-[:operates_in_region]->(regions:Region)<-[:operates_in_region]-(p:Company)
> match 
> (c)-[:operates_in_sector]->(sectors:Sector)-[:has_child*0..1]-(:Sector)<-[:operates_in_sector]-(p)
> match 
> (c)-[:has_objective]->(:BusRel)-[:looks_for]->(busrels:BusRel)<-[:has_objective]-(p)
> return p.name, collect(distinct(busrels.name)), collect(distinct(
> sectors.name)), collect(distinct(regions.name)) 
>
> This is the product of some query tuning based on some reading and a 
> recent video by Wes and Mark. The problem is that this query returns in 
> about 100 - 120ms, which seems slow to me. (please correct me if I'm wrong, 
> but I've seen examples of far more complex db's and queries returning in 
> under 20ms)
>
> I've tried profiling it, which returns the following:
>
> ColumnFilter(symKeys=["p.name", " 
>  INTERNAL_AGGREGATEd6949509-3054-42f0-99dd-d0389f6b0e7f", " 
>  INTERNAL_AGGREGATE7016f1ed-68e5-4260-bbaf-7c513953e457", " 
>  INTERNAL_AGGREGATEaf0e7813-ba3e-48b4-a3bf-a2aa25085a4f"], 
> returnItemNames=["p.name", "collect(distinct(busrels.name))", 
> "collect(distinct(sectors.name))", "collect(distinct(regions.name))"], 
> _rows=4, _db_hits=0)
> EagerAggregation(keys=["Cached(p.name of type Any)"], aggregates=["( 
>  
> INTERNAL_AGGREGATEd6949509-3054-42f0-99dd-d0389f6b0e7f,Distinct(Collect(Property(busrels,name(1))),Property(busrels,name(1))))",
>  
> "( 
>  
> INTERNAL_AGGREGATE7016f1ed-68e5-4260-bbaf-7c513953e457,Distinct(Collect(Property(sectors,name(1))),Property(sectors,name(1))))",
>  
> "( 
>  
> INTERNAL_AGGREGATEaf0e7813-ba3e-48b4-a3bf-a2aa25085a4f,Distinct(Collect(Property(regions,name(1))),Property(regions,name(1))))"],
>  
> _rows=4, _db_hits=48)
>   Extract(symKeys=["sectors", "  UNNAMED170", "  UNNAMED178", " 
>  UNNAMED215", "  UNNAMED150", "  UNNAMED110", "regions", "  UNNAMED274", 
> "p", "  UNNAMED25", "c", "  UNNAMED65", "  UNNAMED235", "busrels", " 
>  UNNAMED243"], exprKeys=["p.name"], _rows=10, _db_hits=10)
>     Filter(pred="(((hasLabel(  UNNAMED235:BusRel(3)) AND hasLabel( 
>  UNNAMED235:BusRel(3))) AND hasLabel(busrels:BusRel(3))) AND 
> hasLabel(busrels:BusRel(3)))", _rows=10, _db_hits=0)
>       PatternMatch(g="(c)-['  UNNAMED215']-(  UNNAMED235),(p)-[' 
>  UNNAMED274']-(busrels),(  UNNAMED235)-['  UNNAMED243']-(busrels)", 
> _rows=10, _db_hits=0)
>         Filter(pred="(((hasLabel(sectors:Sector(0)) AND hasLabel( 
>  UNNAMED170:Sector(0))) AND hasLabel(  UNNAMED170:Sector(0))) AND 
> hasLabel(sectors:Sector(0)))", _rows=130, _db_hits=0)
>           PatternMatch(g="(c)-['  UNNAMED110']-(sectors),(p)-[' 
>  UNNAMED178']-(  UNNAMED170),(  UNNAMED170)-['  UNNAMED150']-(sectors)", 
> _rows=130, _db_hits=2733)
>             Filter(pred="hasLabel(p:Company(2))", _rows=584, _db_hits=0)
>               TraversalMatcher(trail="(c)-[  UNNAMED25:operates_in_region 
> WHERE (hasLabel(NodeIdentifier():Region(1)) AND 
> hasLabel(NodeIdentifier():Region(1))) AND true]->(regions)<-[ 
>  UNNAMED65:operates_in_region WHERE hasLabel(NodeIdentifier():Company(2)) 
> AND true]-(p)", _rows=584, _db_hits=586)
>
> There hardly seem to be enough db_hits or rows being processed to warrant 
> a 100ms execution time. I'd be happy retrieving only the first matched 
> region, sector and busrel to optimize the query so that it would stop 
> searching for more matches between two companies and return faster. Ie. I 
> just need one match across the bridging nodes, not all of them, but I'm not 
> sure if that's possible / how to do that.
>
> This is running on my development machine, which is an i5 2500K with 8GB 
> of RAM and the DB is on a fast SSD. I've tried increasing the JVM heap size 
> from the default to 1GB, then 2GB and then 4GB with the same results. Any 
> help / tips would be appreciated.
>
> The second part of the question involves writing a "summary" query, to 
> create a paged list of Companies and total number of matches, using the 
> same criteria. I've come up with this, sans paging:
>
> match 
> (c:Company)-[:operates_in_region]->(:Region)<-[:operates_in_region]-(p:Company)
> match 
> (c)-[:operates_in_sector]->(:Sector)-[:has_child*0..1]-(:Sector)<-[:operates_in_sector]-(p)
> match 
> (c)-[:has_objective]->(:BusRel)-[:looks_for]->(:BusRel)<-[:has_objective]-(p)
> return c.name, count(p)
>
> But I suspect there is a serious flaw in the query as this never returns. 
> (or at least it hasn't before hitting the 30s maximum execution time I've 
> set)
>
> How would I go about constructing something like that?
>
> Many thanks in advance.
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/groups/opt_out.
> <debug.png>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Neo4j] Slow cypher query in relatively small data set, and help writing a related query

Reply via email to