Re: [Neo4j] Re: Poor performance on relatively big db

'Michael Hunger' via Neo4j Thu, 29 Dec 2016 13:12:38 -0800

In 3.1  (and then also 3.2)
you could also use the compiled runtime (except for the count)


So a query like

cypher runtime=compiledExperimentalFeatureNotSupportedForProductionUse
 profile
MATCH (n1:`Label1`)<-[r1:REL1]-(n:`Label2`) WHERE n.`prop1` = 'aaa' AND
n1.`prop2`='bbb' RETURN 1

should use the compiled runtime and run faster than the interpreted
runtime. In 3.2 it will be again (runtime=compiled) or actually the default
in enterprise.

Michael


On Thu, Dec 29, 2016 at 10:07 PM, Michael Hunger <
[email protected]> wrote:

> It depends a bit.
>
> If you can aggressively filter on both sides (doesn't seem so with your >
> 2M results).
>
> Then you can do an index lookup for both sides and do a WHERE on the cross
> product.
>
> The other thing you can try is this (make sure n1 is the label with
> smaller cardinality, which you can determine upfront quickly, by running
> MATCH (:Label)-[:TYPE]->()  return count(*)
>
> profile
> MATCH (n:`Label2`) WHERE n.`prop1` = 'aaa' WITH collect(n) as nodes
> MATCH (n1:`Label1`)<-[:REL1]-(n) WHERE n1.`prop2`='bbb' AND n IN nodes
> RETURN COUNT(*) AS `Count`;
>
>
> profile
> MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb'
> USING INDEX n1:Label1(prop1)
> MATCH (n:`Label2`)-[:REL1]->(n1) WHERE n.`prop2`='aaa'
> USING INDEX n:Label2(prop2)
> RETURN COUNT(*) AS `Count`;
>
> profile
> MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb'
> USING INDEX n1:Label1(prop1)
> MATCH (n:`Label2`) WHERE n.`prop2`='aaa' AND (n)-[:REL1]->(n1)
> USING INDEX n:Label2(prop2)
> RETURN COUNT(*) AS `Count`;
>
> Please try these variants, I can imagine reasons for either being faster.
> And let us know.
>
> Your store is not that big (3M nodes 100M rels) but have you made sure it
> is completely memory mapped?
>
> Michael
>
> On Thu, Dec 29, 2016 at 3:10 PM, Matias Burak <[email protected]> wrote:
>
>> Thanks Michael, maybe using the boolean property as label would help but
>> it's very complicated for us to do that because we are allowing the users
>> to build their own data model, so there might be a lot of boolean
>> properties, and we would have to turn all those properties into labels,
>> making the database very complex and the way to retrieve data too.
>> Also, it's not just about booleans, we also have other cases like
>> filtering by strings instead of boolean, and that's slow too.
>> The players/clubs graph was something i made up to protect our data.
>> This is what we get when we try to filter on 2 properties on related
>> nodes.
>>
>> neo4j-sh (?)$ profile MATCH (n1:`Label1`)<-[r1:REL1]-(n:`Label2`) WHERE
>> n.`prop1` = 'aaa' AND n1.`prop2`='bbb' RETURN COUNT(*) AS `Count`;
>> +---------+
>> | Count   |
>> +---------+
>> | 2127237 |
>> +---------+
>> 1 row
>> 5747 ms
>>
>> Compiler CYPHER 3.1
>>
>> Planner COST
>>
>> Runtime INTERPRETED
>>
>> +-------------------+----------------+---------+---------+--
>> -----------+-----------------------------------------------+
>> | Operator          | Estimated Rows | Rows    | DB Hits | Variables   |
>> Other                                         |
>> +-------------------+----------------+---------+---------+--
>> -----------+-----------------------------------------------+
>> | +ProduceResults   |            498 |       1 |       0 | Count       |
>> Count                                         |
>> | |                 +----------------+---------+--
>> -------+-------------+-----------------------------------------------+
>> | +EagerAggregation |            498 |       1 |       0 | Count       |
>>                                               |
>> | |                 +----------------+---------+--
>> -------+-------------+-----------------------------------------------+
>> | +Filter           |         248298 | 2127237 | 4815283 | n, n1, r1   |
>> n.prop1 == {  AUTOSTRING0} AND n:Label2       |
>> | |                 +----------------+---------+--
>> -------+-------------+-----------------------------------------------+
>> | +Expand(All)      |         815288 | 2444571 | 2444572 | n, r1 -- n1 |
>> (n1)<-[r1:REL1]-(n)                           |
>> | |                 +----------------+---------+--
>> -------+-------------+-----------------------------------------------+
>> | +NodeIndexSeek    |              1 |       1 |       2 | n1          |
>> :Label1(prop2)                                |
>> +-------------------+----------------+---------+---------+--
>> -----------+-----------------------------------------------+
>>
>> Total database accesses: 7259857
>>
>>
>>
>> El miércoles, 28 de diciembre de 2016, 8:06:45 (UTC-3), Michael Hunger
>> escribió:
>>>
>>> I would turn boolean properties, like international into a label.
>>>
>>>
>>> like this:
>>> MATCH (n1:`Club`)<-[r1:CLUB]-(n:International) WHERE n1.name='FC
>>> Barcelona' RETURN COUNT(*) AS `Count`
>>>
>>> Something seems to be off with your data too.
>>>
>>> As there are hardly 2.2M international players in FC Barcelona (and 2.4M
>>> in total) it will take a bit to expand / load them.
>>> What kind of Hardware do you run this on?
>>>
>>> Or is this some computer game and players there?
>>>
>>> Please also try something like this:
>>>
>>> MATCH (c:Club {name:"FC Barcelona"})
>>> MATCH (p:Player:International) WHERE (p)-[:CLUB]->(c)
>>> RETURN count(*);
>>>
>>> Michael
>>>
>>>
>>> On Tue, Dec 27, 2016 at 9:38 PM, Matias Burak <[email protected]> wrote:
>>>
>>>> neo4j-sh (?)$ profile MATCH (n1:`Club`)<-[r1:CLUB]-(n:`Player`) WHERE
>>>> n.`international` = true AND n1.name='FC Barcelona' RETURN COUNT(*) AS
>>>> `Count`;
>>>> +---------+
>>>> | Count   |
>>>> +---------+
>>>> | 2181771 |
>>>> +---------+
>>>> 1 row
>>>> 7912 ms
>>>>
>>>> Compiler CYPHER 3.1
>>>>
>>>> Planner COST
>>>>
>>>> Runtime INTERPRETED
>>>>
>>>> +-------------------+----------------+---------+---------+--
>>>> -----------+--------------------------------------------------+
>>>> | Operator          | Estimated Rows | Rows    | DB Hits | Variables
>>>> | Other                                            |
>>>> +-------------------+----------------+---------+---------+--
>>>> -----------+--------------------------------------------------+
>>>> | +ProduceResults   |            593 |       1 |       0 | Count
>>>> | Count                                            |
>>>> | |                 +----------------+---------+--
>>>> -------+-------------+--------------------------------------
>>>> ------------+
>>>> | +EagerAggregation |            593 |       1 |       0 | Count
>>>> |                                                  |
>>>> | |                 +----------------+---------+--
>>>> -------+-------------+--------------------------------------
>>>> ------------+
>>>> | +Filter           |         351532 | 2181771 | 4869817 | n, n1, r1
>>>> | n.international == {  AUTOBOOL0} AND n:Player    |
>>>> | |                 +----------------+---------+--
>>>> -------+-------------+--------------------------------------
>>>> ------------+
>>>> | +Expand(All)      |         815288 | 2444571 | 2444572 | n, r1 -- n1
>>>> | (n1)<-[r1:CLUB]-(n)                              |
>>>> | |                 +----------------+---------+--
>>>> -------+-------------+--------------------------------------
>>>> ------------+
>>>> | +NodeIndexSeek    |              1 |       1 |       2 | n1
>>>>  | :Club(name)                                      |
>>>> +-------------------+----------------+---------+---------+--
>>>> -----------+--------------------------------------------------+
>>>>
>>>> Total database accesses: 7314391
>>>>
>>>> That's the actual query and the execution plan. And yes, we do have
>>>> indexes on :Club(name) and :Player(international)
>>>>
>>>>
>>>>
>>>>
>>>> El martes, 27 de diciembre de 2016, 1:00:41 (UTC-3), Max De Marzi Jr.
>>>> escribió:
>>>>>
>>>>>
>>>>> Have you created Indexes for the fields you will be searching on? For
>>>>> example:
>>>>>
>>>>> CREATE INDEX ON :Club(name)
>>>>>
>>>>> What is the actual query you are trying to optimize and what is the
>>>>> actual query execution plan of the query?
>>>>>
>>>>> On Sunday, December 25, 2016 at 1:17:30 PM UTC-6, Matias Burak wrote:
>>>>>>
>>>>>> Well, actually I was simplifying the query but what we really need to
>>>>>> do is to filter by some property in club too (like club.name = 'FC
>>>>>> Barcelona') so that solution wouldn't work...
>>>>>>
>>>>>>
>>>>>> El 25 de dic. de 2016 13:27 -0300, Max De Marzi Jr. <
>>>>>> [email protected]>, escribió:
>>>>>>
>>>>>> Try:
>>>>>>
>>>>>> MATCH (n:`Player`) WHERE n.`international` = true RETURN SUM(
>>>>>> size((n)-[:CLUB]->()) ) AS `Count`;
>>>>>>
>>>>>> On Saturday, December 24, 2016 at 3:55:26 AM UTC-6, Matias Burak
>>>>>> wrote:
>>>>>>>
>>>>>>> We are working with the latest 3.1 version, we have a database of
>>>>>>> around 3.5M nodes and 100M relationships.
>>>>>>> Doing some basic queries is very slow that makes the application
>>>>>>> unusable.
>>>>>>>
>>>>>>> neo4j-sh (?)$ profile MATCH (n1:`Club`)<-[r1:CLUB]-(n:`Player`)
>>>>>>> WHERE n.`international` = true RETURN COUNT(*) AS `Count`;
>>>>>>> +---------+
>>>>>>> | Count   |
>>>>>>> +---------+
>>>>>>> | 2181771 |
>>>>>>> +---------+
>>>>>>> 1 row
>>>>>>> 28676 ms
>>>>>>>
>>>>>>> Compiler CYPHER 3.1
>>>>>>>
>>>>>>> Planner COST
>>>>>>>
>>>>>>> Runtime INTERPRETED
>>>>>>>
>>>>>>> +-------------------+----------------+---------+---------+--
>>>>>>> -----------+-----------------------------------------------+
>>>>>>> | Operator          | Estimated Rows | Rows    | DB Hits | Variables
>>>>>>>   | Other                                         |
>>>>>>> +-------------------+----------------+---------+---------+--
>>>>>>> -----------+-----------------------------------------------+
>>>>>>> | +ProduceResults   |           1027 |       1 |       0 | Count
>>>>>>>   | Count                                         |
>>>>>>> | |                 +----------------+---------+--
>>>>>>> -------+-------------+--------------------------------------
>>>>>>> ---------+
>>>>>>> | +EagerAggregation |           1027 |       1 |       0 | Count
>>>>>>>   |                                               |
>>>>>>> | |                 +----------------+---------+--
>>>>>>> -------+-------------+--------------------------------------
>>>>>>> ---------+
>>>>>>> | +Filter           |        1054596 | 2181771 | 4872405 | n, n1,
>>>>>>> r1   | n.international == {  AUTOBOOL0} AND n:Player |
>>>>>>> | |                 +----------------+---------+--
>>>>>>> -------+-------------+--------------------------------------
>>>>>>> ---------+
>>>>>>> | +Expand(All)      |        2445865 | 2445865 | 2445868 | n, r1 --
>>>>>>> n1 | (n1)<-[r1:CLUB]-(n)                           |
>>>>>>> | |                 +----------------+---------+--
>>>>>>> -------+-------------+--------------------------------------
>>>>>>> ---------+
>>>>>>> | +NodeByLabelScan  |              3 |       3 |       4 | n1
>>>>>>>    | :Club                                         |
>>>>>>> +-------------------+----------------+---------+---------+--
>>>>>>> -----------+-----------------------------------------------+
>>>>>>>
>>>>>>> Total database accesses: 7318277
>>>>>>>
>>>>>>> There are around 2.3M players and 3 clubs.
>>>>>>> It's taking 28secs the first time, then in takes around 9-10 secs.
>>>>>>> It's really annoying because it's a very simple query.
>>>>>>> Is there anything we are doing wrong? Is this something we can do
>>>>>>> about or it's the expected behaviour? This is running on a 16gbRAM 
>>>>>>> machine,
>>>>>>> getting more RAM is the only way to improve it?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Matias.
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to a topic in
>>>>>> the Google Groups "Neo4j" group.
>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>>>>>> pic/neo4j/uZtQWJOC1HE/unsubscribe.
>>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>>> [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: Poor performance on relatively big db

Reply via email to