Re: [Neo4j] Re: Poor performance on relatively big db

'Michael Hunger' via Neo4j Thu, 29 Dec 2016 13:08:28 -0800

It depends a bit.

If you can aggressively filter on both sides (doesn't seem so with your >
2M results).


Then you can do an index lookup for both sides and do a WHERE on the cross
product.

The other thing you can try is this (make sure n1 is the label with smaller
cardinality, which you can determine upfront quickly, by running MATCH
(:Label)-[:TYPE]->()  return count(*)

profile
MATCH (n:`Label2`) WHERE n.`prop1` = 'aaa' WITH collect(n) as nodes
MATCH (n1:`Label1`)<-[:REL1]-(n) WHERE n1.`prop2`='bbb' AND n IN nodes
RETURN COUNT(*) AS `Count`;


profile
MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb'
USING INDEX n1:Label1(prop1)
MATCH (n:`Label2`)-[:REL1]->(n1) WHERE n.`prop2`='aaa'
USING INDEX n:Label2(prop2)
RETURN COUNT(*) AS `Count`;

profile
MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb'
USING INDEX n1:Label1(prop1)
MATCH (n:`Label2`) WHERE n.`prop2`='aaa' AND (n)-[:REL1]->(n1)
USING INDEX n:Label2(prop2)
RETURN COUNT(*) AS `Count`;

Please try these variants, I can imagine reasons for either being faster.
And let us know.

Your store is not that big (3M nodes 100M rels) but have you made sure it
is completely memory mapped?

Michael

On Thu, Dec 29, 2016 at 3:10 PM, Matias Burak <[email protected]> wrote:

> Thanks Michael, maybe using the boolean property as label would help but
> it's very complicated for us to do that because we are allowing the users
> to build their own data model, so there might be a lot of boolean
> properties, and we would have to turn all those properties into labels,
> making the database very complex and the way to retrieve data too.
> Also, it's not just about booleans, we also have other cases like
> filtering by strings instead of boolean, and that's slow too.
> The players/clubs graph was something i made up to protect our data.
> This is what we get when we try to filter on 2 properties on related nodes.
>
> neo4j-sh (?)$ profile MATCH (n1:`Label1`)<-[r1:REL1]-(n:`Label2`) WHERE
> n.`prop1` = 'aaa' AND n1.`prop2`='bbb' RETURN COUNT(*) AS `Count`;
> +---------+
> | Count   |
> +---------+
> | 2127237 |
> +---------+
> 1 row
> 5747 ms
>
> Compiler CYPHER 3.1
>
> Planner COST
>
> Runtime INTERPRETED
>
> +-------------------+----------------+---------+---------+--
> -----------+-----------------------------------------------+
> | Operator          | Estimated Rows | Rows    | DB Hits | Variables   |
> Other                                         |
> +-------------------+----------------+---------+---------+--
> -----------+-----------------------------------------------+
> | +ProduceResults   |            498 |       1 |       0 | Count       |
> Count                                         |
> | |                 +----------------+---------+--
> -------+-------------+-----------------------------------------------+
> | +EagerAggregation |            498 |       1 |       0 | Count       |
>                                             |
> | |                 +----------------+---------+--
> -------+-------------+-----------------------------------------------+
> | +Filter           |         248298 | 2127237 | 4815283 | n, n1, r1   |
> n.prop1 == {  AUTOSTRING0} AND n:Label2       |
> | |                 +----------------+---------+--
> -------+-------------+-----------------------------------------------+
> | +Expand(All)      |         815288 | 2444571 | 2444572 | n, r1 -- n1 |
> (n1)<-[r1:REL1]-(n)                           |
> | |                 +----------------+---------+--
> -------+-------------+-----------------------------------------------+
> | +NodeIndexSeek    |              1 |       1 |       2 | n1          |
> :Label1(prop2)                                |
> +-------------------+----------------+---------+---------+--
> -----------+-----------------------------------------------+
>
> Total database accesses: 7259857
>
>
>
> El miércoles, 28 de diciembre de 2016, 8:06:45 (UTC-3), Michael Hunger
> escribió:
>>
>> I would turn boolean properties, like international into a label.
>>
>>
>> like this:
>> MATCH (n1:`Club`)<-[r1:CLUB]-(n:International) WHERE n1.name='FC
>> Barcelona' RETURN COUNT(*) AS `Count`
>>
>> Something seems to be off with your data too.
>>
>> As there are hardly 2.2M international players in FC Barcelona (and 2.4M
>> in total) it will take a bit to expand / load them.
>> What kind of Hardware do you run this on?
>>
>> Or is this some computer game and players there?
>>
>> Please also try something like this:
>>
>> MATCH (c:Club {name:"FC Barcelona"})
>> MATCH (p:Player:International) WHERE (p)-[:CLUB]->(c)
>> RETURN count(*);
>>
>> Michael
>>
>>
>> On Tue, Dec 27, 2016 at 9:38 PM, Matias Burak <[email protected]> wrote:
>>
>>> neo4j-sh (?)$ profile MATCH (n1:`Club`)<-[r1:CLUB]-(n:`Player`) WHERE
>>> n.`international` = true AND n1.name='FC Barcelona' RETURN COUNT(*) AS
>>> `Count`;
>>> +---------+
>>> | Count   |
>>> +---------+
>>> | 2181771 |
>>> +---------+
>>> 1 row
>>> 7912 ms
>>>
>>> Compiler CYPHER 3.1
>>>
>>> Planner COST
>>>
>>> Runtime INTERPRETED
>>>
>>> +-------------------+----------------+---------+---------+--
>>> -----------+--------------------------------------------------+
>>> | Operator          | Estimated Rows | Rows    | DB Hits | Variables   |
>>> Other                                            |
>>> +-------------------+----------------+---------+---------+--
>>> -----------+--------------------------------------------------+
>>> | +ProduceResults   |            593 |       1 |       0 | Count       |
>>> Count                                            |
>>> | |                 +----------------+---------+--
>>> -------+-------------+--------------------------------------
>>> ------------+
>>> | +EagerAggregation |            593 |       1 |       0 | Count       |
>>>                                                  |
>>> | |                 +----------------+---------+--
>>> -------+-------------+--------------------------------------
>>> ------------+
>>> | +Filter           |         351532 | 2181771 | 4869817 | n, n1, r1   |
>>> n.international == {  AUTOBOOL0} AND n:Player    |
>>> | |                 +----------------+---------+--
>>> -------+-------------+--------------------------------------
>>> ------------+
>>> | +Expand(All)      |         815288 | 2444571 | 2444572 | n, r1 -- n1 |
>>> (n1)<-[r1:CLUB]-(n)                              |
>>> | |                 +----------------+---------+--
>>> -------+-------------+--------------------------------------
>>> ------------+
>>> | +NodeIndexSeek    |              1 |       1 |       2 | n1          |
>>> :Club(name)                                      |
>>> +-------------------+----------------+---------+---------+--
>>> -----------+--------------------------------------------------+
>>>
>>> Total database accesses: 7314391
>>>
>>> That's the actual query and the execution plan. And yes, we do have
>>> indexes on :Club(name) and :Player(international)
>>>
>>>
>>>
>>>
>>> El martes, 27 de diciembre de 2016, 1:00:41 (UTC-3), Max De Marzi Jr.
>>> escribió:
>>>>
>>>>
>>>> Have you created Indexes for the fields you will be searching on? For
>>>> example:
>>>>
>>>> CREATE INDEX ON :Club(name)
>>>>
>>>> What is the actual query you are trying to optimize and what is the
>>>> actual query execution plan of the query?
>>>>
>>>> On Sunday, December 25, 2016 at 1:17:30 PM UTC-6, Matias Burak wrote:
>>>>>
>>>>> Well, actually I was simplifying the query but what we really need to
>>>>> do is to filter by some property in club too (like club.name = 'FC
>>>>> Barcelona') so that solution wouldn't work...
>>>>>
>>>>>
>>>>> El 25 de dic. de 2016 13:27 -0300, Max De Marzi Jr. <
>>>>> [email protected]>, escribió:
>>>>>
>>>>> Try:
>>>>>
>>>>> MATCH (n:`Player`) WHERE n.`international` = true RETURN SUM(
>>>>> size((n)-[:CLUB]->()) ) AS `Count`;
>>>>>
>>>>> On Saturday, December 24, 2016 at 3:55:26 AM UTC-6, Matias Burak
>>>>> wrote:
>>>>>>
>>>>>> We are working with the latest 3.1 version, we have a database of
>>>>>> around 3.5M nodes and 100M relationships.
>>>>>> Doing some basic queries is very slow that makes the application
>>>>>> unusable.
>>>>>>
>>>>>> neo4j-sh (?)$ profile MATCH (n1:`Club`)<-[r1:CLUB]-(n:`Player`)
>>>>>> WHERE n.`international` = true RETURN COUNT(*) AS `Count`;
>>>>>> +---------+
>>>>>> | Count   |
>>>>>> +---------+
>>>>>> | 2181771 |
>>>>>> +---------+
>>>>>> 1 row
>>>>>> 28676 ms
>>>>>>
>>>>>> Compiler CYPHER 3.1
>>>>>>
>>>>>> Planner COST
>>>>>>
>>>>>> Runtime INTERPRETED
>>>>>>
>>>>>> +-------------------+----------------+---------+---------+--
>>>>>> -----------+-----------------------------------------------+
>>>>>> | Operator          | Estimated Rows | Rows    | DB Hits | Variables
>>>>>>   | Other                                         |
>>>>>> +-------------------+----------------+---------+---------+--
>>>>>> -----------+-----------------------------------------------+
>>>>>> | +ProduceResults   |           1027 |       1 |       0 | Count
>>>>>>   | Count                                         |
>>>>>> | |                 +----------------+---------+--
>>>>>> -------+-------------+--------------------------------------
>>>>>> ---------+
>>>>>> | +EagerAggregation |           1027 |       1 |       0 | Count
>>>>>>   |                                               |
>>>>>> | |                 +----------------+---------+--
>>>>>> -------+-------------+--------------------------------------
>>>>>> ---------+
>>>>>> | +Filter           |        1054596 | 2181771 | 4872405 | n, n1, r1
>>>>>>   | n.international == {  AUTOBOOL0} AND n:Player |
>>>>>> | |                 +----------------+---------+--
>>>>>> -------+-------------+--------------------------------------
>>>>>> ---------+
>>>>>> | +Expand(All)      |        2445865 | 2445865 | 2445868 | n, r1 --
>>>>>> n1 | (n1)<-[r1:CLUB]-(n)                           |
>>>>>> | |                 +----------------+---------+--
>>>>>> -------+-------------+--------------------------------------
>>>>>> ---------+
>>>>>> | +NodeByLabelScan  |              3 |       3 |       4 | n1
>>>>>>  | :Club                                         |
>>>>>> +-------------------+----------------+---------+---------+--
>>>>>> -----------+-----------------------------------------------+
>>>>>>
>>>>>> Total database accesses: 7318277
>>>>>>
>>>>>> There are around 2.3M players and 3 clubs.
>>>>>> It's taking 28secs the first time, then in takes around 9-10 secs.
>>>>>> It's really annoying because it's a very simple query.
>>>>>> Is there anything we are doing wrong? Is this something we can do
>>>>>> about or it's the expected behaviour? This is running on a 16gbRAM 
>>>>>> machine,
>>>>>> getting more RAM is the only way to improve it?
>>>>>>
>>>>>> Thanks,
>>>>>> Matias.
>>>>>>
>>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "Neo4j" group.
>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>>>>> pic/neo4j/uZtQWJOC1HE/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: Poor performance on relatively big db

Reply via email to