Re: [Neo4j] Re: Poor performance on relatively big db

Matias Burak Fri, 30 Dec 2016 06:33:09 -0800

I think most of the slowness is when doing count. If I just get a small set 
doing limit it works fine.


Also, while trying to test your suggestions, I wan’t able to get USING INDEX to 
work, is there anything wrong in the syntax?

neo4j> profile
       MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb'
       USING INDEX n1:Label1(prop1)
       MATCH (n:`Label2`)-[:REL1]->(n1) WHERE n.`prop2`='aaa'
       USING INDEX n:Label2(prop2)
       RETURN COUNT(*) AS `Count`;
Invalid input 'S': expected 'n/N' (line 4, column 2 (offset: 56))
"USING INDEX n1:Label1(prop1)"
            ^

Also, how can I make sure is it completely memory mapped? I have 16gb on the 
environment I’m testing it and and I left the neo4j defaults for memory.
 
> El 29 dic. 2016, a las 18:12, 'Michael Hunger' via Neo4j 
> <[email protected]> escribió:
> 
> In 3.1  (and then also 3.2)
> you could also use the compiled runtime (except for the count)
> 
> So a query like
> 
> cypher runtime=compiledExperimentalFeatureNotSupportedForProductionUse 
> profile 
> MATCH (n1:`Label1`)<-[r1:REL1]-(n:`Label2`) WHERE n.`prop1` = 'aaa' AND 
> n1.`prop2`='bbb' RETURN 1
> 
> should use the compiled runtime and run faster than the interpreted runtime. 
> In 3.2 it will be again (runtime=compiled) or actually the default in 
> enterprise.
> 
> Michael
> 
> 
> On Thu, Dec 29, 2016 at 10:07 PM, Michael Hunger 
> <[email protected] <mailto:[email protected]>> 
> wrote:
> It depends a bit.
> 
> If you can aggressively filter on both sides (doesn't seem so with your > 2M 
> results).
> 
> Then you can do an index lookup for both sides and do a WHERE on the cross 
> product.
> 
> The other thing you can try is this (make sure n1 is the label with smaller 
> cardinality, which you can determine upfront quickly, by running MATCH 
> (:Label)-[:TYPE]->()  return count(*)
> 
> profile 
> MATCH (n:`Label2`) WHERE n.`prop1` = 'aaa' WITH collect(n) as nodes
> MATCH (n1:`Label1`)<-[:REL1]-(n) WHERE n1.`prop2`='bbb' AND n IN nodes 
> RETURN COUNT(*) AS `Count`;
> 
> 
> profile 
> MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb'
> USING INDEX n1:Label1(prop1)
> MATCH (n:`Label2`)-[:REL1]->(n1) WHERE n.`prop2`='aaa'
> USING INDEX n:Label2(prop2)
> RETURN COUNT(*) AS `Count`;
> 
> profile 
> MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb'
> USING INDEX n1:Label1(prop1)
> MATCH (n:`Label2`) WHERE n.`prop2`='aaa' AND (n)-[:REL1]->(n1)
> USING INDEX n:Label2(prop2)
> RETURN COUNT(*) AS `Count`;
> 
> Please try these variants, I can imagine reasons for either being faster.
> And let us know.
> 
> Your store is not that big (3M nodes 100M rels) but have you made sure it is 
> completely memory mapped?
> 
> Michael
> 
> On Thu, Dec 29, 2016 at 3:10 PM, Matias Burak <[email protected] 
> <mailto:[email protected]>> wrote:
> Thanks Michael, maybe using the boolean property as label would help but it's 
> very complicated for us to do that because we are allowing the users to build 
> their own data model, so there might be a lot of boolean properties, and we 
> would have to turn all those properties into labels, making the database very 
> complex and the way to retrieve data too.
> Also, it's not just about booleans, we also have other cases like filtering 
> by strings instead of boolean, and that's slow too.
> The players/clubs graph was something i made up to protect our data. 
> This is what we get when we try to filter on 2 properties on related nodes.
> 
> neo4j-sh (?)$ profile MATCH (n1:`Label1`)<-[r1:REL1]-(n:`Label2`) WHERE 
> n.`prop1` = 'aaa' AND n1.`prop2`='bbb' RETURN COUNT(*) AS `Count`;
> +---------+
> | Count   |
> +---------+
> | 2127237 |
> +---------+
> 1 row
> 5747 ms
> 
> Compiler CYPHER 3.1
> 
> Planner COST
> 
> Runtime INTERPRETED
> 
> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
> | Operator          | Estimated Rows | Rows    | DB Hits | Variables   | 
> Other                                         |
> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
> | +ProduceResults   |            498 |       1 |       0 | Count       | 
> Count                                         |
> | |                 
> +----------------+---------+---------+-------------+-----------------------------------------------+
> | +EagerAggregation |            498 |       1 |       0 | Count       |      
>                                          |
> | |                 
> +----------------+---------+---------+-------------+-----------------------------------------------+
> | +Filter           |         248298 | 2127237 | 4815283 | n, n1, r1   | 
> n.prop1 == {  AUTOSTRING0} AND n:Label2       |
> | |                 
> +----------------+---------+---------+-------------+-----------------------------------------------+
> | +Expand(All)      |         815288 | 2444571 | 2444572 | n, r1 -- n1 | 
> (n1)<-[r1:REL1]-(n)                           |
> | |                 
> +----------------+---------+---------+-------------+-----------------------------------------------+
> | +NodeIndexSeek    |              1 |       1 |       2 | n1          | 
> :Label1(prop2)                                |
> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
> 
> Total database accesses: 7259857
> 
> 
> 
> El miércoles, 28 de diciembre de 2016, 8:06:45 (UTC-3), Michael Hunger 
> escribió:
> I would turn boolean properties, like international into a label.
> 
> 
> like this:
> MATCH (n1:`Club`)<-[r1:CLUB]-(n:International) WHERE n1.name 
> <http://n1.name/>='FC Barcelona' RETURN COUNT(*) AS `Count`
> 
> Something seems to be off with your data too.
> 
> As there are hardly 2.2M international players in FC Barcelona (and 2.4M in 
> total) it will take a bit to expand / load them.
> What kind of Hardware do you run this on?
> 
> Or is this some computer game and players there?
> 
> Please also try something like this:
> 
> MATCH (c:Club {name:"FC Barcelona"})
> MATCH (p:Player:International) WHERE (p)-[:CLUB]->(c)
> RETURN count(*);
> 
> Michael
> 
> 
> On Tue, Dec 27, 2016 at 9:38 PM, Matias Burak <[email protected] <>> wrote:
> neo4j-sh (?)$ profile MATCH (n1:`Club`)<-[r1:CLUB]-(n:`Player`) WHERE 
> n.`international` = true AND n1.name <http://n1.name/>='FC Barcelona' RETURN 
> COUNT(*) AS `Count`;
> +---------+
> | Count   |
> +---------+
> | 2181771 |
> +---------+
> 1 row
> 7912 ms
> 
> Compiler CYPHER 3.1
> 
> Planner COST
> 
> Runtime INTERPRETED
> 
> +-------------------+----------------+---------+---------+-------------+--------------------------------------------------+
> | Operator          | Estimated Rows | Rows    | DB Hits | Variables   | 
> Other                                            |
> +-------------------+----------------+---------+---------+-------------+--------------------------------------------------+
> | +ProduceResults   |            593 |       1 |       0 | Count       | 
> Count                                            |
> | |                 
> +----------------+---------+---------+-------------+--------------------------------------------------+
> | +EagerAggregation |            593 |       1 |       0 | Count       |      
>                                             |
> | |                 
> +----------------+---------+---------+-------------+--------------------------------------------------+
> | +Filter           |         351532 | 2181771 | 4869817 | n, n1, r1   | 
> n.international == {  AUTOBOOL0} AND n:Player    |
> | |                 
> +----------------+---------+---------+-------------+--------------------------------------------------+
> | +Expand(All)      |         815288 | 2444571 | 2444572 | n, r1 -- n1 | 
> (n1)<-[r1:CLUB]-(n)                              |
> | |                 
> +----------------+---------+---------+-------------+--------------------------------------------------+
> | +NodeIndexSeek    |              1 |       1 |       2 | n1          | 
> :Club(name)                                      |
> +-------------------+----------------+---------+---------+-------------+--------------------------------------------------+
> 
> Total database accesses: 7314391
> 
> That's the actual query and the execution plan. And yes, we do have indexes 
> on :Club(name) and :Player(international)
> 
> 
> 
> 
> El martes, 27 de diciembre de 2016, 1:00:41 (UTC-3), Max De Marzi Jr. 
> escribió:
> 
> Have you created Indexes for the fields you will be searching on? For example:
> 
> CREATE INDEX ON :Club(name)
> 
> What is the actual query you are trying to optimize and what is the actual 
> query execution plan of the query?
> 
> On Sunday, December 25, 2016 at 1:17:30 PM UTC-6, Matias Burak wrote:
> Well, actually I was simplifying the query but what we really need to do is 
> to filter by some property in club too (like club.name <http://club.name/> = 
> 'FC Barcelona') so that solution wouldn't work...
> 
> 
> El 25 de dic. de 2016 13:27 -0300, Max De Marzi Jr. <[email protected] <>>, 
> escribió:
>> Try:
>> 
>> MATCH (n:`Player`) WHERE n.`international` = true RETURN 
>> SUM(size((n)-[:CLUB]->()) ) AS `Count`;
>> 
>> On Saturday, December 24, 2016 at 3:55:26 AM UTC-6, Matias Burak wrote:
>> We are working with the latest 3.1 version, we have a database of around 
>> 3.5M nodes and 100M relationships.
>> Doing some basic queries is very slow that makes the application unusable.
>> 
>> neo4j-sh (?)$ profile MATCH (n1:`Club`)<-[r1:CLUB]-(n:`Player`) WHERE 
>> n.`international` = true RETURN COUNT(*) AS `Count`;
>> +---------+
>> | Count   |
>> +---------+
>> | 2181771 <> |
>> +---------+
>> 1 row
>> 28676 ms
>> 
>> Compiler CYPHER 3.1
>> 
>> Planner COST
>> 
>> Runtime INTERPRETED
>> 
>> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
>> | Operator          | Estimated Rows | Rows    | DB Hits | Variables   | 
>> Other                                         |
>> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
>> | +ProduceResults   |           1027 |       1 |       0 | Count       | 
>> Count                                         |
>> | |                 
>> +----------------+---------+---------+-------------+-----------------------------------------------+
>> | +EagerAggregation |           1027 |       1 |       0 | Count       |     
>>                                           |
>> | |                 
>> +----------------+---------+---------+-------------+-----------------------------------------------+
>> | +Filter           |        1054596 <> | 2181771 <> | 4872405 <> | n, n1, 
>> r1   | n.international == {  AUTOBOOL0} AND n:Player |
>> | |                 
>> +----------------+---------+---------+-------------+-----------------------------------------------+
>> | +Expand(All)      |        2445865 <> | 2445865 <> | 2445868 <> | n, r1 -- 
>> n1 | (n1)<-[r1:CLUB]-(n)                           |
>> | |                 
>> +----------------+---------+---------+-------------+-----------------------------------------------+
>> | +NodeByLabelScan  |              3 |       3 |       4 | n1          | 
>> :Club                                         |
>> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
>> 
>> Total database accesses: 7318277 <>
>> There are around 2.3M players and 3 clubs. 
>> It's taking 28secs the first time, then in takes around 9-10 secs.
>> It's really annoying because it's a very simple query.
>> Is there anything we are doing wrong? Is this something we can do about or 
>> it's the expected behaviour? This is running on a 16gbRAM machine, getting 
>> more RAM is the only way to improve it?
>> 
>> Thanks,
>> Matias. 
>>  
>> 
>> --
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Neo4j" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/neo4j/uZtQWJOC1HE/unsubscribe 
>> <https://groups.google.com/d/topic/neo4j/uZtQWJOC1HE/unsubscribe>.
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected] <>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> 
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "Neo4j" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/neo4j/uZtQWJOC1HE/unsubscribe 
> <https://groups.google.com/d/topic/neo4j/uZtQWJOC1HE/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: Poor performance on relatively big db

Reply via email to