In 3.1 (and then also 3.2) you could also use the compiled runtime (except for the count)
So a query like cypher runtime=compiledExperimentalFeatureNotSupportedForProductionUse profile MATCH (n1:`Label1`)<-[r1:REL1]-(n:`Label2`) WHERE n.`prop1` = 'aaa' AND n1.`prop2`='bbb' RETURN 1 should use the compiled runtime and run faster than the interpreted runtime. In 3.2 it will be again (runtime=compiled) or actually the default in enterprise. Michael On Thu, Dec 29, 2016 at 10:07 PM, Michael Hunger < [email protected]> wrote: > It depends a bit. > > If you can aggressively filter on both sides (doesn't seem so with your > > 2M results). > > Then you can do an index lookup for both sides and do a WHERE on the cross > product. > > The other thing you can try is this (make sure n1 is the label with > smaller cardinality, which you can determine upfront quickly, by running > MATCH (:Label)-[:TYPE]->() return count(*) > > profile > MATCH (n:`Label2`) WHERE n.`prop1` = 'aaa' WITH collect(n) as nodes > MATCH (n1:`Label1`)<-[:REL1]-(n) WHERE n1.`prop2`='bbb' AND n IN nodes > RETURN COUNT(*) AS `Count`; > > > profile > MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb' > USING INDEX n1:Label1(prop1) > MATCH (n:`Label2`)-[:REL1]->(n1) WHERE n.`prop2`='aaa' > USING INDEX n:Label2(prop2) > RETURN COUNT(*) AS `Count`; > > profile > MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb' > USING INDEX n1:Label1(prop1) > MATCH (n:`Label2`) WHERE n.`prop2`='aaa' AND (n)-[:REL1]->(n1) > USING INDEX n:Label2(prop2) > RETURN COUNT(*) AS `Count`; > > Please try these variants, I can imagine reasons for either being faster. > And let us know. > > Your store is not that big (3M nodes 100M rels) but have you made sure it > is completely memory mapped? > > Michael > > On Thu, Dec 29, 2016 at 3:10 PM, Matias Burak <[email protected]> wrote: > >> Thanks Michael, maybe using the boolean property as label would help but >> it's very complicated for us to do that because we are allowing the users >> to build their own data model, so there might be a lot of boolean >> properties, and we would have to turn all those properties into labels, >> making the database very complex and the way to retrieve data too. >> Also, it's not just about booleans, we also have other cases like >> filtering by strings instead of boolean, and that's slow too. >> The players/clubs graph was something i made up to protect our data. >> This is what we get when we try to filter on 2 properties on related >> nodes. >> >> neo4j-sh (?)$ profile MATCH (n1:`Label1`)<-[r1:REL1]-(n:`Label2`) WHERE >> n.`prop1` = 'aaa' AND n1.`prop2`='bbb' RETURN COUNT(*) AS `Count`; >> +---------+ >> | Count | >> +---------+ >> | 2127237 | >> +---------+ >> 1 row >> 5747 ms >> >> Compiler CYPHER 3.1 >> >> Planner COST >> >> Runtime INTERPRETED >> >> +-------------------+----------------+---------+---------+-- >> -----------+-----------------------------------------------+ >> | Operator | Estimated Rows | Rows | DB Hits | Variables | >> Other | >> +-------------------+----------------+---------+---------+-- >> -----------+-----------------------------------------------+ >> | +ProduceResults | 498 | 1 | 0 | Count | >> Count | >> | | +----------------+---------+-- >> -------+-------------+-----------------------------------------------+ >> | +EagerAggregation | 498 | 1 | 0 | Count | >> | >> | | +----------------+---------+-- >> -------+-------------+-----------------------------------------------+ >> | +Filter | 248298 | 2127237 | 4815283 | n, n1, r1 | >> n.prop1 == { AUTOSTRING0} AND n:Label2 | >> | | +----------------+---------+-- >> -------+-------------+-----------------------------------------------+ >> | +Expand(All) | 815288 | 2444571 | 2444572 | n, r1 -- n1 | >> (n1)<-[r1:REL1]-(n) | >> | | +----------------+---------+-- >> -------+-------------+-----------------------------------------------+ >> | +NodeIndexSeek | 1 | 1 | 2 | n1 | >> :Label1(prop2) | >> +-------------------+----------------+---------+---------+-- >> -----------+-----------------------------------------------+ >> >> Total database accesses: 7259857 >> >> >> >> El miércoles, 28 de diciembre de 2016, 8:06:45 (UTC-3), Michael Hunger >> escribió: >>> >>> I would turn boolean properties, like international into a label. >>> >>> >>> like this: >>> MATCH (n1:`Club`)<-[r1:CLUB]-(n:International) WHERE n1.name='FC >>> Barcelona' RETURN COUNT(*) AS `Count` >>> >>> Something seems to be off with your data too. >>> >>> As there are hardly 2.2M international players in FC Barcelona (and 2.4M >>> in total) it will take a bit to expand / load them. >>> What kind of Hardware do you run this on? >>> >>> Or is this some computer game and players there? >>> >>> Please also try something like this: >>> >>> MATCH (c:Club {name:"FC Barcelona"}) >>> MATCH (p:Player:International) WHERE (p)-[:CLUB]->(c) >>> RETURN count(*); >>> >>> Michael >>> >>> >>> On Tue, Dec 27, 2016 at 9:38 PM, Matias Burak <[email protected]> wrote: >>> >>>> neo4j-sh (?)$ profile MATCH (n1:`Club`)<-[r1:CLUB]-(n:`Player`) WHERE >>>> n.`international` = true AND n1.name='FC Barcelona' RETURN COUNT(*) AS >>>> `Count`; >>>> +---------+ >>>> | Count | >>>> +---------+ >>>> | 2181771 | >>>> +---------+ >>>> 1 row >>>> 7912 ms >>>> >>>> Compiler CYPHER 3.1 >>>> >>>> Planner COST >>>> >>>> Runtime INTERPRETED >>>> >>>> +-------------------+----------------+---------+---------+-- >>>> -----------+--------------------------------------------------+ >>>> | Operator | Estimated Rows | Rows | DB Hits | Variables >>>> | Other | >>>> +-------------------+----------------+---------+---------+-- >>>> -----------+--------------------------------------------------+ >>>> | +ProduceResults | 593 | 1 | 0 | Count >>>> | Count | >>>> | | +----------------+---------+-- >>>> -------+-------------+-------------------------------------- >>>> ------------+ >>>> | +EagerAggregation | 593 | 1 | 0 | Count >>>> | | >>>> | | +----------------+---------+-- >>>> -------+-------------+-------------------------------------- >>>> ------------+ >>>> | +Filter | 351532 | 2181771 | 4869817 | n, n1, r1 >>>> | n.international == { AUTOBOOL0} AND n:Player | >>>> | | +----------------+---------+-- >>>> -------+-------------+-------------------------------------- >>>> ------------+ >>>> | +Expand(All) | 815288 | 2444571 | 2444572 | n, r1 -- n1 >>>> | (n1)<-[r1:CLUB]-(n) | >>>> | | +----------------+---------+-- >>>> -------+-------------+-------------------------------------- >>>> ------------+ >>>> | +NodeIndexSeek | 1 | 1 | 2 | n1 >>>> | :Club(name) | >>>> +-------------------+----------------+---------+---------+-- >>>> -----------+--------------------------------------------------+ >>>> >>>> Total database accesses: 7314391 >>>> >>>> That's the actual query and the execution plan. And yes, we do have >>>> indexes on :Club(name) and :Player(international) >>>> >>>> >>>> >>>> >>>> El martes, 27 de diciembre de 2016, 1:00:41 (UTC-3), Max De Marzi Jr. >>>> escribió: >>>>> >>>>> >>>>> Have you created Indexes for the fields you will be searching on? For >>>>> example: >>>>> >>>>> CREATE INDEX ON :Club(name) >>>>> >>>>> What is the actual query you are trying to optimize and what is the >>>>> actual query execution plan of the query? >>>>> >>>>> On Sunday, December 25, 2016 at 1:17:30 PM UTC-6, Matias Burak wrote: >>>>>> >>>>>> Well, actually I was simplifying the query but what we really need to >>>>>> do is to filter by some property in club too (like club.name = 'FC >>>>>> Barcelona') so that solution wouldn't work... >>>>>> >>>>>> >>>>>> El 25 de dic. de 2016 13:27 -0300, Max De Marzi Jr. < >>>>>> [email protected]>, escribió: >>>>>> >>>>>> Try: >>>>>> >>>>>> MATCH (n:`Player`) WHERE n.`international` = true RETURN SUM( >>>>>> size((n)-[:CLUB]->()) ) AS `Count`; >>>>>> >>>>>> On Saturday, December 24, 2016 at 3:55:26 AM UTC-6, Matias Burak >>>>>> wrote: >>>>>>> >>>>>>> We are working with the latest 3.1 version, we have a database of >>>>>>> around 3.5M nodes and 100M relationships. >>>>>>> Doing some basic queries is very slow that makes the application >>>>>>> unusable. >>>>>>> >>>>>>> neo4j-sh (?)$ profile MATCH (n1:`Club`)<-[r1:CLUB]-(n:`Player`) >>>>>>> WHERE n.`international` = true RETURN COUNT(*) AS `Count`; >>>>>>> +---------+ >>>>>>> | Count | >>>>>>> +---------+ >>>>>>> | 2181771 | >>>>>>> +---------+ >>>>>>> 1 row >>>>>>> 28676 ms >>>>>>> >>>>>>> Compiler CYPHER 3.1 >>>>>>> >>>>>>> Planner COST >>>>>>> >>>>>>> Runtime INTERPRETED >>>>>>> >>>>>>> +-------------------+----------------+---------+---------+-- >>>>>>> -----------+-----------------------------------------------+ >>>>>>> | Operator | Estimated Rows | Rows | DB Hits | Variables >>>>>>> | Other | >>>>>>> +-------------------+----------------+---------+---------+-- >>>>>>> -----------+-----------------------------------------------+ >>>>>>> | +ProduceResults | 1027 | 1 | 0 | Count >>>>>>> | Count | >>>>>>> | | +----------------+---------+-- >>>>>>> -------+-------------+-------------------------------------- >>>>>>> ---------+ >>>>>>> | +EagerAggregation | 1027 | 1 | 0 | Count >>>>>>> | | >>>>>>> | | +----------------+---------+-- >>>>>>> -------+-------------+-------------------------------------- >>>>>>> ---------+ >>>>>>> | +Filter | 1054596 | 2181771 | 4872405 | n, n1, >>>>>>> r1 | n.international == { AUTOBOOL0} AND n:Player | >>>>>>> | | +----------------+---------+-- >>>>>>> -------+-------------+-------------------------------------- >>>>>>> ---------+ >>>>>>> | +Expand(All) | 2445865 | 2445865 | 2445868 | n, r1 -- >>>>>>> n1 | (n1)<-[r1:CLUB]-(n) | >>>>>>> | | +----------------+---------+-- >>>>>>> -------+-------------+-------------------------------------- >>>>>>> ---------+ >>>>>>> | +NodeByLabelScan | 3 | 3 | 4 | n1 >>>>>>> | :Club | >>>>>>> +-------------------+----------------+---------+---------+-- >>>>>>> -----------+-----------------------------------------------+ >>>>>>> >>>>>>> Total database accesses: 7318277 >>>>>>> >>>>>>> There are around 2.3M players and 3 clubs. >>>>>>> It's taking 28secs the first time, then in takes around 9-10 secs. >>>>>>> It's really annoying because it's a very simple query. >>>>>>> Is there anything we are doing wrong? Is this something we can do >>>>>>> about or it's the expected behaviour? This is running on a 16gbRAM >>>>>>> machine, >>>>>>> getting more RAM is the only way to improve it? >>>>>>> >>>>>>> Thanks, >>>>>>> Matias. >>>>>>> >>>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to a topic in >>>>>> the Google Groups "Neo4j" group. >>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/to >>>>>> pic/neo4j/uZtQWJOC1HE/unsubscribe. >>>>>> To unsubscribe from this group and all its topics, send an email to >>>>>> [email protected]. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Neo4j" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
