Neat. The idea is to limit down the # of rows processed at a time as much as possible.
In your case the subgraph at the beginning is quite limiting, and only then you go further and match the rest (which expands again) I got your global query down to 30s by reordering the matches Starting from the BusRels (fewest entries) (but is not a big difference you can also start with c:Company) this order of matches proved most effective to prune down the # of rows processed in between. match (c)-[:has_objective]->(:BusRel)-[:looks_for]->(:BusRel)<-[:has_objective]-(p) with distinct c,p match (c)-[:operates_in_sector]->()-[:has_child*0..1]-()<-[:operates_in_sector]-(p) with distinct c,p match (c)-[:operates_in_region]->()<-[:operates_in_region]-(p) with distinct c,p with c.name as name , count(p) as cnt return count(*) ; Hope it helps. Michael Am 28.01.2014 um 16:12 schrieb Ziv Unger <[email protected]>: > Sorry for the email spam, but based on your suggestion I stripped the Label > lookups on the paths that can't be any other type of label, and distilled the > query down to this: > > match (c:Company {id: > 6})-[:operates_in_region]->(regions)<-[:operates_in_region]-(p:Company),(c)-[:has_objective]->()-[:looks_for]->(busrels)<-[:has_objective]-(p) > match > (c)-[:operates_in_sector]->(sectors)-[:has_child*0..1]-()<-[:operates_in_sector]-(p) > return p.name, collect(distinct(busrels.name)), > collect(distinct(sectors.name)), collect(distinct(regions.name)); > > Which is incredibly returning at 22ms! That's pretty damn impressive. I've > implemented the same changes in the summary query, but still can't get it to > return in decent amount of time. I'll keep trying this side, any more help in > that regard is much appreciated. > > Thanks again! > > On Tuesday, 28 January 2014 16:55:35 UTC+2, Ziv Unger wrote: > Completely missed your last line with the modified query! > > I just ran that in a console and it returns around 33ms! > > I tried having all 3 matches comma separated in one statement, which raised > the time to over 700ms, and had it as a continuation of the cypher pattern > from the first (p:Person), but never thought to just have two. I was working > on the theory from the Webinar in which you said that multiple patterns > should be separated out into separate MATCH statements. How does one know > what will make a difference like this, as in my mind I'd expect the results > to be similar? > > On Tuesday, 28 January 2014 14:17:08 UTC+2, Michael Hunger wrote: > > I think you already did a really good job. > > How did you measure? With browser? As Wes said in the webinar that's quite > unreliable. So either measure with neo4j-shell or with your own, parametrized > code. > > What kind of indexes/constraints do you have in your db? > e.g. unique constraint on :Company(id) or :Region(name) ? > > You might drop label checks in places where due to the relationship-types > there is only one label possible anyway. > > Did you try to pull more of the pattern in a single match? > > Also probably sensible to look at the cardinalities in between and try to > reduce them if they are getting too high? (e.g. with WITH distinct > c,p,regions.name, busrels.name > > How do individual parts of the pattern perform? > >> match (c:Company {id: >> 6})-[:operates_in_region]->(regions:Region)<-[:operates_in_region]-(p:Company),(c)-[:has_objective]->(:BusRel)-[:looks_for]->(busrels:BusRel)<-[:has_objective]-(p) >> match >> (c)-[:operates_in_sector]->(sectors:Sector)-[:has_child*0..1]-(:Sector)<-[:operates_in_sector]-(p) >> return p.name, collect(distinct(busrels.name)), >> collect(distinct(sectors.name)), collect(distinct(regions.name)) > Any chance to get your db (privately) for testing? > > Am 28.01.2014 um 12:18 schrieb Ziv Unger <[email protected]>: > >> Hi >> >> I have a relatively small database, consisting of the following elements, >> each of which is defined as a Label: >> >> Company >> BusRel (business relationship) >> Region >> Sector >> >> Sectors are a parent/child hierarchy (max. 1 level deep), with a >> [:has_child] relationship between a parent and children. (approximately 122 >> nodes total) >> >> Regions are also a parent/child hierarchy, but are present for example as: >> (:Region {name: 'Africa'})-[:has_child]->(:Region {name: 'South >> Africa'})-[:has_child]->(:Region {name: 'Gauteng'})-[:has_child]->(:Region >> {name: 'Pretoria'}), ie. Continent all the way down to City with a >> :has_child relationship from parent to child. (approximately 50000 nodes >> total) >> >> BusRel are business relationships. There are matching pairs which are linked >> to each other (bi-directional links), for example: {:BusRel {name: 'Looking >> for a distributor'})<-[:looks_for]->(:BusRel {name: 'Looking to >> distribute'}) (approximately 18 nodes total) >> >> Companies aren't linked to each other directly, but rather via one or more >> of the Labels above. A company could have multiple BusRels and Sectors, but >> only one Region (at present). (approximately 10000 nodes total) >> >> I've attached a screenshot from the web admin to illustrate an example, the >> nodes being: >> >> Blue: Company >> Grey: Region >> Yellow: BusRel >> Orange: Sector >> >> I've written a query to start at a specified company (there is a unique >> constraint on Company property "id") and retrieve all other Companies which >> share (1) one or more corresponding business relationship (2) the same >> region (3) one or more sectors. It looks like this: >> >> match (c:Company {id: >> 6})-[:operates_in_region]->(regions:Region)<-[:operates_in_region]-(p:Company) >> match >> (c)-[:operates_in_sector]->(sectors:Sector)-[:has_child*0..1]-(:Sector)<-[:operates_in_sector]-(p) >> match >> (c)-[:has_objective]->(:BusRel)-[:looks_for]->(busrels:BusRel)<-[:has_objective]-(p) >> return p.name, collect(distinct(busrels.name)), >> collect(distinct(sectors.name)), collect(distinct(regions.name)) >> >> This is the product of some query tuning based on some reading and a recent >> video by Wes and Mark. The problem is that this query returns in about 100 - >> 120ms, which seems slow to me. (please correct me if I'm wrong, but I've >> seen examples of far more complex db's and queries returning in under 20ms) >> >> I've tried profiling it, which returns the following: >> >> ColumnFilter(symKeys=["p.name", " >> INTERNAL_AGGREGATEd6949509-3054-42f0-99dd-d0389f6b0e7f", " >> INTERNAL_AGGREGATE7016f1ed-68e5-4260-bbaf-7c513953e457", " >> INTERNAL_AGGREGATEaf0e7813-ba3e-48b4-a3bf-a2aa25085a4f"], >> returnItemNames=["p.name", "collect(distinct(busrels.name))", >> "collect(distinct(sectors.name))", "collect(distinct(regions.name))"], >> _rows=4, _db_hits=0) >> EagerAggregation(keys=["Cached(p.name of type Any)"], aggregates=["( >> INTERNAL_AGGREGATEd6949509-3054-42f0-99dd-d0389f6b0e7f,Distinct(Collect(Property(busrels,name(1))),Property(busrels,name(1))))", >> "( >> INTERNAL_AGGREGATE7016f1ed-68e5-4260-bbaf-7c513953e457,Distinct(Collect(Property(sectors,name(1))),Property(sectors,name(1))))", >> "( >> INTERNAL_AGGREGATEaf0e7813-ba3e-48b4-a3bf-a2aa25085a4f,Distinct(Collect(Property(regions,name(1))),Property(regions,name(1))))"], >> _rows=4, _db_hits=48) >> Extract(symKeys=["sectors", " UNNAMED170", " UNNAMED178", " >> UNNAMED215", " UNNAMED150", " UNNAMED110", "regions", " UNNAMED274", "p", >> " UNNAMED25", "c", " UNNAMED65", " UNNAMED235", "busrels", " >> UNNAMED243"], exprKeys=["p.name"], _rows=10, _db_hits=10) >> Filter(pred="(((hasLabel( UNNAMED235:BusRel(3)) AND hasLabel( >> UNNAMED235:BusRel(3))) AND hasLabel(busrels:BusRel(3))) AND >> hasLabel(busrels:BusRel(3)))", _rows=10, _db_hits=0) >> PatternMatch(g="(c)-[' UNNAMED215']-( UNNAMED235),(p)-[' >> UNNAMED274']-(busrels),( UNNAMED235)-[' UNNAMED243']-(busrels)", _rows=10, >> _db_hits=0) >> Filter(pred="(((hasLabel(sectors:Sector(0)) AND hasLabel( >> UNNAMED170:Sector(0))) AND hasLabel( UNNAMED170:Sector(0))) AND >> hasLabel(sectors:Sector(0)))", _rows=130, _db_hits=0) >> PatternMatch(g="(c)-[' UNNAMED110']-(sectors),(p)-[' >> UNNAMED178']-( UNNAMED170),( UNNAMED170)-[' UNNAMED150']-(sectors)", >> _rows=130, _db_hits=2733) >> Filter(pred="hasLabel(p:Company(2))", _rows=584, _db_hits=0) >> TraversalMatcher(trail="(c)-[ UNNAMED25:operates_in_region >> WHERE (hasLabel(NodeIdentifier():Region(1)) AND >> hasLabel(NodeIdentifier():Region(1))) AND true]->(regions)<-[ >> UNNAMED65:operates_in_region WHERE hasLabel(NodeIdentifier():Company(2)) AND >> true]-(p)", _rows=584, _db_hits=586) >> >> There hardly seem to be enough db_hits or rows being processed to warrant a >> 100ms execution time. I'd be happy retrieving only the first matched region, >> sector and busrel to optimize the query so that it would stop searching for >> more matches between two companies and return faster. Ie. I just need one >> match across the bridging nodes, not all of them, but I'm not sure if that's >> possible / how to do that. >> >> This is running on my development machine, which is an i5 2500K with 8GB of >> RAM and the DB is on a fast SSD. I've tried increasing the JVM heap size >> from the default to 1GB, then 2GB and then 4GB with the same results. Any >> help / tips would be appreciated. >> >> The second part of the question involves writing a "summary" query, to >> create a paged list of Companies and total number of matches, using the same >> criteria. I've come up with this, sans paging: >> >> match >> (c:Company)-[:operates_in_region]->(:Region)<-[:operates_in_region]-(p:Company) >> match >> (c)-[:operates_in_sector]->(:Sector)-[:has_child*0..1]-(:Sector)<-[:operates_in_sector]-(p) >> match >> (c)-[:has_objective]->(:BusRel)-[:looks_for]->(:BusRel)<-[:has_objective]-(p) >> return c.name, count(p) >> >> But I suspect there is a serious flaw in the query as this never returns. >> (or at least it hasn't before hitting the 30s maximum execution time I've >> set) >> >> How would I go about constructing something like that? >> >> Many thanks in advance. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/groups/opt_out. >> <debug.png> > > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
