I think you already did a really good job.
How did you measure? With browser? As Wes said in the webinar that's quite
unreliable. So either measure with neo4j-shell or with your own, parametrized
code.
What kind of indexes/constraints do you have in your db?
e.g. unique constraint on :Company(id) or :Region(name) ?
You might drop label checks in places where due to the relationship-types there
is only one label possible anyway.
Did you try to pull more of the pattern in a single match?
Also probably sensible to look at the cardinalities in between and try to
reduce them if they are getting too high? (e.g. with WITH distinct
c,p,regions.name, busrels.name
How do individual parts of the pattern perform?
> match (c:Company {id:
> 6})-[:operates_in_region]->(regions:Region)<-[:operates_in_region]-(p:Company),(c)-[:has_objective]->(:BusRel)-[:looks_for]->(busrels:BusRel)<-[:has_objective]-(p)
> match
> (c)-[:operates_in_sector]->(sectors:Sector)-[:has_child*0..1]-(:Sector)<-[:operates_in_sector]-(p)
> return p.name, collect(distinct(busrels.name)),
> collect(distinct(sectors.name)), collect(distinct(regions.name))
Any chance to get your db (privately) for testing?
Am 28.01.2014 um 12:18 schrieb Ziv Unger <[email protected]>:
> Hi
>
> I have a relatively small database, consisting of the following elements,
> each of which is defined as a Label:
>
> Company
> BusRel (business relationship)
> Region
> Sector
>
> Sectors are a parent/child hierarchy (max. 1 level deep), with a [:has_child]
> relationship between a parent and children. (approximately 122 nodes total)
>
> Regions are also a parent/child hierarchy, but are present for example as:
> (:Region {name: 'Africa'})-[:has_child]->(:Region {name: 'South
> Africa'})-[:has_child]->(:Region {name: 'Gauteng'})-[:has_child]->(:Region
> {name: 'Pretoria'}), ie. Continent all the way down to City with a :has_child
> relationship from parent to child. (approximately 50000 nodes total)
>
> BusRel are business relationships. There are matching pairs which are linked
> to each other (bi-directional links), for example: {:BusRel {name: 'Looking
> for a distributor'})<-[:looks_for]->(:BusRel {name: 'Looking to distribute'})
> (approximately 18 nodes total)
>
> Companies aren't linked to each other directly, but rather via one or more of
> the Labels above. A company could have multiple BusRels and Sectors, but only
> one Region (at present). (approximately 10000 nodes total)
>
> I've attached a screenshot from the web admin to illustrate an example, the
> nodes being:
>
> Blue: Company
> Grey: Region
> Yellow: BusRel
> Orange: Sector
>
> I've written a query to start at a specified company (there is a unique
> constraint on Company property "id") and retrieve all other Companies which
> share (1) one or more corresponding business relationship (2) the same region
> (3) one or more sectors. It looks like this:
>
> match (c:Company {id:
> 6})-[:operates_in_region]->(regions:Region)<-[:operates_in_region]-(p:Company)
> match
> (c)-[:operates_in_sector]->(sectors:Sector)-[:has_child*0..1]-(:Sector)<-[:operates_in_sector]-(p)
> match
> (c)-[:has_objective]->(:BusRel)-[:looks_for]->(busrels:BusRel)<-[:has_objective]-(p)
> return p.name, collect(distinct(busrels.name)),
> collect(distinct(sectors.name)), collect(distinct(regions.name))
>
> This is the product of some query tuning based on some reading and a recent
> video by Wes and Mark. The problem is that this query returns in about 100 -
> 120ms, which seems slow to me. (please correct me if I'm wrong, but I've seen
> examples of far more complex db's and queries returning in under 20ms)
>
> I've tried profiling it, which returns the following:
>
> ColumnFilter(symKeys=["p.name", "
> INTERNAL_AGGREGATEd6949509-3054-42f0-99dd-d0389f6b0e7f", "
> INTERNAL_AGGREGATE7016f1ed-68e5-4260-bbaf-7c513953e457", "
> INTERNAL_AGGREGATEaf0e7813-ba3e-48b4-a3bf-a2aa25085a4f"],
> returnItemNames=["p.name", "collect(distinct(busrels.name))",
> "collect(distinct(sectors.name))", "collect(distinct(regions.name))"],
> _rows=4, _db_hits=0)
> EagerAggregation(keys=["Cached(p.name of type Any)"], aggregates=["(
> INTERNAL_AGGREGATEd6949509-3054-42f0-99dd-d0389f6b0e7f,Distinct(Collect(Property(busrels,name(1))),Property(busrels,name(1))))",
> "(
> INTERNAL_AGGREGATE7016f1ed-68e5-4260-bbaf-7c513953e457,Distinct(Collect(Property(sectors,name(1))),Property(sectors,name(1))))",
> "(
> INTERNAL_AGGREGATEaf0e7813-ba3e-48b4-a3bf-a2aa25085a4f,Distinct(Collect(Property(regions,name(1))),Property(regions,name(1))))"],
> _rows=4, _db_hits=48)
> Extract(symKeys=["sectors", " UNNAMED170", " UNNAMED178", " UNNAMED215",
> " UNNAMED150", " UNNAMED110", "regions", " UNNAMED274", "p", "
> UNNAMED25", "c", " UNNAMED65", " UNNAMED235", "busrels", " UNNAMED243"],
> exprKeys=["p.name"], _rows=10, _db_hits=10)
> Filter(pred="(((hasLabel( UNNAMED235:BusRel(3)) AND hasLabel(
> UNNAMED235:BusRel(3))) AND hasLabel(busrels:BusRel(3))) AND
> hasLabel(busrels:BusRel(3)))", _rows=10, _db_hits=0)
> PatternMatch(g="(c)-[' UNNAMED215']-( UNNAMED235),(p)-['
> UNNAMED274']-(busrels),( UNNAMED235)-[' UNNAMED243']-(busrels)", _rows=10,
> _db_hits=0)
> Filter(pred="(((hasLabel(sectors:Sector(0)) AND hasLabel(
> UNNAMED170:Sector(0))) AND hasLabel( UNNAMED170:Sector(0))) AND
> hasLabel(sectors:Sector(0)))", _rows=130, _db_hits=0)
> PatternMatch(g="(c)-[' UNNAMED110']-(sectors),(p)-['
> UNNAMED178']-( UNNAMED170),( UNNAMED170)-[' UNNAMED150']-(sectors)",
> _rows=130, _db_hits=2733)
> Filter(pred="hasLabel(p:Company(2))", _rows=584, _db_hits=0)
> TraversalMatcher(trail="(c)-[ UNNAMED25:operates_in_region
> WHERE (hasLabel(NodeIdentifier():Region(1)) AND
> hasLabel(NodeIdentifier():Region(1))) AND true]->(regions)<-[
> UNNAMED65:operates_in_region WHERE hasLabel(NodeIdentifier():Company(2)) AND
> true]-(p)", _rows=584, _db_hits=586)
>
> There hardly seem to be enough db_hits or rows being processed to warrant a
> 100ms execution time. I'd be happy retrieving only the first matched region,
> sector and busrel to optimize the query so that it would stop searching for
> more matches between two companies and return faster. Ie. I just need one
> match across the bridging nodes, not all of them, but I'm not sure if that's
> possible / how to do that.
>
> This is running on my development machine, which is an i5 2500K with 8GB of
> RAM and the DB is on a fast SSD. I've tried increasing the JVM heap size from
> the default to 1GB, then 2GB and then 4GB with the same results. Any help /
> tips would be appreciated.
>
> The second part of the question involves writing a "summary" query, to create
> a paged list of Companies and total number of matches, using the same
> criteria. I've come up with this, sans paging:
>
> match
> (c:Company)-[:operates_in_region]->(:Region)<-[:operates_in_region]-(p:Company)
> match
> (c)-[:operates_in_sector]->(:Sector)-[:has_child*0..1]-(:Sector)<-[:operates_in_sector]-(p)
> match
> (c)-[:has_objective]->(:BusRel)-[:looks_for]->(:BusRel)<-[:has_objective]-(p)
> return c.name, count(p)
>
> But I suspect there is a serious flaw in the query as this never returns. (or
> at least it hasn't before hitting the 30s maximum execution time I've set)
>
> How would I go about constructing something like that?
>
> Many thanks in advance.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
> <debug.png>
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.