Re: [Neo4j] Slow cypher query in relatively small data set, and help writing a related query

Ziv Unger Tue, 28 Jan 2014 06:50:08 -0800

Thanks for the reply! I have forwarded you a link to the DB in an email.

I measured in browser and using the neo4j-shell. Both give me the same 
result, namely between 100-120ms once the cache has warmed up.


The DB is an export of the basic structure of a PostgreSQL DB, so at the 
moment lookups are based on the primary key id of the record in Postgres. 
As such, all of the nodes have id and name properties, and all of the 
labels have a unique constraint on id.

I did try to put more of the pattern in a single match, but did not notice 
an improvement.

I tried various combinations of WITH inbetween the matches to try and 
filter out some nodes, but avoided distinct as I thought that might slow 
the query down further. I'll give it a shot now though.

The individual matches all run around the same, 100-120ms.

Thanks again!

On Tuesday, 28 January 2014 14:17:08 UTC+2, Michael Hunger wrote:
>
>
> I think you already did a really good job.
>
> How did you measure? With browser? As Wes said in the webinar that's quite 
> unreliable. So either measure with neo4j-shell or with your own, 
> parametrized code.
>
> What kind of indexes/constraints do you have in your db?
> e.g. unique constraint on :Company(id) or :Region(name) ?
>
> You might drop label checks in places where due to the relationship-types 
> there is only one label possible anyway.
>
> Did you try to pull more of the pattern in a single match?
>
> Also probably sensible to look at the cardinalities in between and try to 
> reduce them if they are getting too high? (e.g. with WITH distinct c,p,
> regions.name, busrels.name
>
> How do individual parts of the pattern perform?
>
> match (c:Company {id: 
> 6})-[:operates_in_region]->(regions:Region)<-[:operates_in_region]-(p:Company),(c)-[:has_objective]->(:BusRel)-[:looks_for]->(busrels:BusRel)<-[:has_objective]-(p)
>
> match 
> (c)-[:operates_in_sector]->(sectors:Sector)-[:has_child*0..1]-(:Sector)<-[:operates_in_sector]-(p)
> return p.name, collect(distinct(busrels.name)), collect(distinct(
> sectors.name)), collect(distinct(regions.name)) 
>
> Any chance to get your db (privately) for testing?
>
> Am 28.01.2014 um 12:18 schrieb Ziv Unger <[email protected] <javascript:>
> >:
>
> Hi
>
> I have a relatively small database, consisting of the following elements, 
> each of which is defined as a Label:
>
> Company
> BusRel (business relationship)
> Region
> Sector
>
> Sectors are a parent/child hierarchy (max. 1 level deep), with a 
> [:has_child] relationship between a parent and children. (approximately 122 
> nodes total)
>
> Regions are also a parent/child hierarchy, but are present for example as: 
> (:Region {name: 'Africa'})-[:has_child]->(:Region {name: 'South 
> Africa'})-[:has_child]->(:Region {name: 'Gauteng'})-[:has_child]->(:Region 
> {name: 'Pretoria'}), ie. Continent all the way down to City with a 
> :has_child relationship from parent to child. (approximately 50000 nodes 
> total)
>
> BusRel are business relationships. There are matching pairs which are 
> linked to each other (bi-directional links), for example: {:BusRel {name: 
> 'Looking for a distributor'})<-[:looks_for]->(:BusRel {name: 'Looking to 
> distribute'})  (approximately 18 nodes total)
>
> Companies aren't linked to each other directly, but rather via one or more 
> of the Labels above. A company could have multiple BusRels and Sectors, but 
> only one Region (at present). (approximately 10000 nodes total)
>
> I've attached a screenshot from the web admin to illustrate an example, 
> the nodes being:
>
> Blue: Company
> Grey: Region
> Yellow: BusRel
> Orange: Sector
>
> I've written a query to start at a specified company (there is a unique 
> constraint on Company property "id") and retrieve all other Companies which 
> share (1) one or more corresponding business relationship (2) the same 
> region (3) one or more sectors. It looks like this:
>
> match (c:Company {id: 
> 6})-[:operates_in_region]->(regions:Region)<-[:operates_in_region]-(p:Company)
> match 
> (c)-[:operates_in_sector]->(sectors:Sector)-[:has_child*0..1]-(:Sector)<-[:operates_in_sector]-(p)
> match 
> (c)-[:has_objective]->(:BusRel)-[:looks_for]->(busrels:BusRel)<-[:has_objective]-(p)
> return p.name, collect(distinct(busrels.name)), collect(distinct(
> sectors.name)), collect(distinct(regions.name)) 
>
> This is the product of some query tuning based on some reading and a 
> recent video by Wes and Mark. The problem is that this query returns in 
> about 100 - 120ms, which seems slow to me. (please correct me if I'm wrong, 
> but I've seen examples of far more complex db's and queries returning in 
> under 20ms)
>
> I've tried profiling it, which returns the following:
>
> ColumnFilter(symKeys=["p.name", " 
>  INTERNAL_AGGREGATEd6949509-3054-42f0-99dd-d0389f6b0e7f", " 
>  INTERNAL_AGGREGATE7016f1ed-68e5-4260-bbaf-7c513953e457", " 
>  INTERNAL_AGGREGATEaf0e7813-ba3e-48b4-a3bf-a2aa25085a4f"], 
> returnItemNames=["p.name", "collect(distinct(busrels.name))", 
> "collect(distinct(sectors.name))", "collect(distinct(regions.name))"], 
> _rows=4, _db_hits=0)
> EagerAggregation(keys=["Cached(p.name of type Any)"], aggregates=["( 
>  
> INTERNAL_AGGREGATEd6949509-3054-42f0-99dd-d0389f6b0e7f,Distinct(Collect(Property(busrels,name(1))),Property(busrels,name(1))))",
>  
> "( 
>  
> INTERNAL_AGGREGATE7016f1ed-68e5-4260-bbaf-7c513953e457,Distinct(Collect(Property(sectors,name(1))),Property(sectors,name(1))))",
>  
> "( 
>  
> INTERNAL_AGGREGATEaf0e7813-ba3e-48b4-a3bf-a2aa25085a4f,Distinct(Collect(Property(regions,name(1))),Property(regions,name(1))))"],
>  
> _rows=4, _db_hits=48)
>   Extract(symKeys=["sectors", "  UNNAMED170", "  UNNAMED178", " 
>  UNNAMED215", "  UNNAMED150", "  UNNAMED110", "regions", "  UNNAMED274", 
> "p", "  UNNAMED25", "c", "  UNNAMED65", "  UNNAMED235", "busrels", " 
>  UNNAMED243"], exprKeys=["p.name"], _rows=10, _db_hits=10)
>     Filter(pred="(((hasLabel(  UNNAMED235:BusRel(3)) AND hasLabel( 
>  UNNAMED235:BusRel(3))) AND hasLabel(busrels:BusRel(3))) AND 
> hasLabel(busrels:BusRel(3)))", _rows=10, _db_hits=0)
>       PatternMatch(g="(c)-['  UNNAMED215']-(  UNNAMED235),(p)-[' 
>  UNNAMED274']-(busrels),(  UNNAMED235)-['  UNNAMED243']-(busrels)", 
> _rows=10, _db_hits=0)
>         Filter(pred="(((hasLabel(sectors:Sector(0)) AND hasLabel( 
>  UNNAMED170:Sector(0))) AND hasLabel(  UNNAMED170:Sector(0))) AND 
> hasLabel(sectors:Sector(0)))", _rows=130, _db_hits=0)
>           PatternMatch(g="(c)-['  UNNAMED110']-(sectors),(p)-[' 
>  UNNAMED178']-(  UNNAMED170),(  UNNAMED170)-['  UNNAMED150']-(sectors)", 
> _rows=130, _db_hits=2733)
>             Filter(pred="hasLabel(p:Company(2))", _rows=584, _db_hits=0)
>               TraversalMatcher(trail="(c)-[  UNNAMED25:operates_in_region 
> WHERE (hasLabel(NodeIdentifier():Region(1)) AND 
> hasLabel(NodeIdentifier():Region(1))) AND true]->(regions)<-[ 
>  UNNAMED65:operates_in_region WHERE hasLabel(NodeIdentifier():Company(2)) 
> AND true]-(p)", _rows=584, _db_hits=586)
>
> There hardly seem to be enough db_hits or rows being processed to warrant 
> a 100ms execution time. I'd be happy retrieving only the first matched 
> region, sector and busrel to optimize the query so that it would stop 
> searching for more matches between two companies and return faster. Ie. I 
> just need one match across the bridging nodes, not all of them, but I'm not 
> sure if that's possible / how to do that.
>
> This is running on my development machine, which is an i5 2500K with 8GB 
> of RAM and the DB is on a fast SSD. I've tried increasing the JVM heap size 
> from the default to 1GB, then 2GB and then 4GB with the same results. Any 
> help / tips would be appreciated.
>
> The second part of the question involves writing a "summary" query, to 
> create a paged list of Companies and total number of matches, using the 
> same criteria. I've come up with this, sans paging:
>
> match 
> (c:Company)-[:operates_in_region]->(:Region)<-[:operates_in_region]-(p:Company)
> match 
> (c)-[:operates_in_sector]->(:Sector)-[:has_child*0..1]-(:Sector)<-[:operates_in_sector]-(p)
> match 
> (c)-[:has_objective]->(:BusRel)-[:looks_for]->(:BusRel)<-[:has_objective]-(p)
> return c.name, count(p)
>
> But I suspect there is a serious flaw in the query as this never returns. 
> (or at least it hasn't before hitting the 30s maximum execution time I've 
> set)
>
> How would I go about constructing something like that?
>
> Many thanks in advance.
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/groups/opt_out.
> <debug.png>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Neo4j] Slow cypher query in relatively small data set, and help writing a related query

Reply via email to