Re: Parent/Child query performance in version 1.1.2

Clinton Gormley Mon, 25 Aug 2014 07:55:58 -0700

Something else to note: parent-child now uses global ordinals to make
queries 3x faster than they were previously, but global ordinals need to be
rebuilt after the index has refreshed (assuming some data has changed).


Currently there is no way to refresh p/c global ordinals "eagerly" (ie
during the refresh phase) and so it happens on the first query after a
refresh.  1.3.3 and 1.4.0 will include an option to allow eager building of
global ordinals which should remove this latency spike:
https://github.com/elasticsearch/elasticsearch/issues/7394

You may want to consider increasing the refresh_interval so that global
ordinals remain valid for longer.


On 25 August 2014 16:48, Mark Greene <[email protected]> wrote:

> Hi Adrien,
>
> Thanks for reaching out.
>
> We actually were exited to see the performance improvements stated in the
> 1.2.0 release notes so we upgraded to 1.3.2. We saw some performance
> improvement but it wasn't orders of magnitude and queries are still running
> very slow.
>
> We also tried your suggestion of using the 'preference=_local' query param
> but we didn't see any difference there. Additionally, running the query 10
> times, we saw no improvement in speed.
>
> Currently, the only major performance increase we've seen with
> parent/child queries is dropping down to 1 data node, at which, we see
> queries executing well under the 100ms mark.
>
>
>
>
> On Friday, August 22, 2014 6:42:27 PM UTC-4, Adrien Grand wrote:
>
>> Hi Mark,
>>
>> Given that you had 1 replica in your first setup, it could take several
>> queries to warm up the field data cache completely, does the query still
>> take 16 seconds to run if you run it eg. 10 times? (3 should be enough, but
>> just to be sure)
>>
>> Does it change anything if you query elasticsearch with
>> preference=_local? This should be equivalent to your single-node setup, so
>> it would be interesting to see if that changes something.
>>
>> As a side note, you might want to try out a more recent version of
>> Elasticsearch since parent/child performance improved quite significantly
>> in 1.2.0 because of https://github.com/elasticsearch/elasticsearch/
>> pull/5846
>>
>>
>>
>> On Fri, Aug 22, 2014 at 11:15 PM, Mark Greene <[email protected]> wrote:
>>
>>> I wanted to update the list with an interesting piece of information. We
>>> found that when we took one of our two data nodes out of the cluster,
>>> leaving just one data node with no replicas, the query performance
>>> increased dramatically. The queries are now returning in <100ms on
>>> subsequent executions which is what we'd expect to see as a result of the
>>> data being stored in the field data cache.
>>>
>>> Is it possible that there is some kind of inefficient code path when a
>>> query is spread across primary and replica shards?
>>>
>>>
>>> On Thursday, August 21, 2014 3:53:40 PM UTC-4, Mark Greene wrote:
>>>>
>>>> We are experiencing slow parent/child queries even when we run the
>>>> query a second time and I wanted to know if this is just the limit of this
>>>> feature within ElasticSearch. According to the ES Docs (
>>>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/c
>>>> urrent/parent-child-performance.html) parent/child queries can be
>>>> 5-10x slower and consume a lot of memory.
>>>>
>>>> My impression has been that as long as we give ES enough memory via the
>>>> field data cache, subsequent queries would be quicker than the first time
>>>> it is executed. We are seeing the following query take ~16 seconds to
>>>> complete every time.
>>>>
>>>>
>>>> {
>>>>     "from": 0,
>>>>     "size": 100,
>>>>     "query": {
>>>>         "filtered": {
>>>>             "query": {
>>>>                 "match_all": {}
>>>>             },
>>>>             "filter": {
>>>>                 "bool": {
>>>>                     "must": [
>>>>                         {
>>>>                             "term": {
>>>>                                 "oid": 61
>>>>                             }
>>>>                         },
>>>>                         {
>>>>                             "has_child": {
>>>>                                 "type": "social",
>>>>                                 "query": {
>>>>                                     "bool": {
>>>>                                         "should": [
>>>>                                             {
>>>>                                                 "term": {
>>>>                                                     "engagement.type":
>>>> "like"
>>>>                                                 }
>>>>                                             },
>>>>                                             {
>>>>                                                 "term": {
>>>>
>>>> "content.remote_id": "20697868961_10152270678178962"
>>>>                                                 }
>>>>                                             }
>>>>                                         ]
>>>>                                     }
>>>>                                 }
>>>>                             }
>>>>                         }
>>>>                     ]
>>>>                 }
>>>>             }
>>>>         }
>>>>     },
>>>>     "fields": "id",
>>>>     "sort": [
>>>>         {
>>>>             "_score": {}
>>>>         },
>>>>         {
>>>>             "id": {
>>>>                 "order": "asc"
>>>>             }
>>>>         }
>>>>     ]
>>>> }
>>>>
>>>>
>>>> The index (which has 5 shards with 1 replica shard) we are testing this
>>>> on has 2.2 million parent documents and 1.1 million child documents.
>>>>
>>>> We are running our two data nodes on r3.2xlarge's which have 8 CPU's,
>>>> 60GB of RAM, and SSD.
>>>>
>>>> Our ES data nodes have 30G of heap and the field data cache is only
>>>> consuming around ~3GB right now and there are no cache evictions. The field
>>>> data cache is also allowed to grow to 75% of the available heap.
>>>>
>>>> I'm looking to understand if this is a limitation with parent/child or
>>>> is there additional configuration that has to be set beyond the defaults
>>>> that would help speed these queries up?
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Adrien Grand
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/87aff37a-04be-472d-88fa-5fe6c6a3f5a7%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/87aff37a-04be-472d-88fa-5fe6c6a3f5a7%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKQ164swXT7iH%2BomK1rviZT-ChX4kOSXTe%3DmxY0VqsGxCQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Parent/Child query performance in version 1.1.2

Reply via email to