Thanks for the explanation.
A follow-up question. If caching the filter for a specific value, say "{
"term": { "status": "paid" } }", will this somehow magically speed up the
query if searching for "status": "unpaid"? I'm not talking about a "not"
operation, but simply replacing the value with something else (like when
creating an index in a RDBMS).
2014-11-11 21:35 GMT+01:00 Ivan Brusic <[email protected]>:
> The status filter cache will indeed contain all entries. And technically,
> the cache is per segment, and not across all documents, but this should be
> transparent.
>
> Caching is enabled by default for the term filters, but disabled for the
> bool filter. You can enable it if you think users will be reusing the
> filter.
>
> --
> Ivan
>
> On Tue, Nov 11, 2014 at 3:23 AM, Lasse Schou <[email protected]> wrote:
>
>> Hi,
>>
>> I have a search request that uses a couple of filters. I'm using
>> bool+must, and I'm trying to optimize the request as much as possible.
>>
>> - Some filters are used by all users of my platform, but aren't very
>> selective.
>> - Some filters are very specific to individual users, and are highly
>> selective.
>>
>> I've read that I should use the most selective filters first, to ease the
>> work performed by the subsequent filters.
>>
>> However one thing that's not 100% clear is how the filter cache bitmaps
>> works. Do they store the result of a filter if performed across the entire
>> dataset, or does it store the filtered result of the previous filter's
>> output?
>>
>> Example. Querying the paid invoices of an account:
>>
>> { "query":
>> { "filtered":
>> { "filter":
>> { "bool":
>> {" must": [
>> { "term": { "status": "paid" } }, (all users use this, but
>> it's not very selective)
>> { "term": { "account": "123456" } }
>> ]}
>> }
>> }
>> }
>> }
>>
>> Following the advice of using the most highly selective filter first, I
>> should place the "account" filter first. On the other hand I want to be
>> sure that all users will re-use the cached output of the "status" filter.
>>
>> Question: will the "status" filter cache contain *all* paid invoices of
>> all accounts, no matter in which order I use the filters?
>>
>> The above code is just an example - I'm trying to optimize the code for a
>> dataset for 1B+ documents, so please take this into consideration.
>>
>> Thanks,
>> Lasse
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/7ea47711-38c1-4bc7-bc7c-41d85fb5cf81%40googlegroups.com
>> <https://groups.google.com/d/msgid/elasticsearch/7ea47711-38c1-4bc7-bc7c-41d85fb5cf81%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/W5p-eeoUnr0/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBXWb82GwrBgAyHKbGXbwtRJ8JaVZhEYB72EnTm%2Brp1qw%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBXWb82GwrBgAyHKbGXbwtRJ8JaVZhEYB72EnTm%2Brp1qw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CADERWXpL6%3DEFF68jKaZkADAQLmLRNW_F%2BVDU%2ByN8Z_PbaQ29Ew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.