Thanks for the clarification - this was very helpful.

On Friday, March 20, 2015 at 5:45:59 AM UTC-7, Jörg Prante wrote:
>
> Caching filters are implemented in ES, not in Lucene. E.g. 
> org.elasticsearch,common.lucene.search.CachedFilter is a class that 
> implements cached filters on the base of Lucene filter class.
>
> The "format" is not only bitsets. The Lucene filter instance is cached, no 
> matter if it is doc sets or bit sets or whatever. ES code extends Lucene 
> filters by several methods for fast evaluation and traversal.
>
> ES evaluates the filter in the given filter chain order, from outer to 
> inner (also called "top down").
>
> When a series of boolean filters (i.e. should/must/must_not) is used, they 
> can be evaluated efficiently by composition. See 
> org.elasticsearch,common.lucene.search.XBooleanFilter for the composition 
> algorithm.
>
> Field data will be loaded when a field is used for operations like filter 
> or sort. The higher the cardinality, the more effort is needed. This is 
> because the index is inverted.
>
> Jörg
>
>
> On Fri, Mar 20, 2015 at 3:30 AM, Ashish Mishra <[email protected] 
> <javascript:>> wrote:
>
>> Not sure I understand the difference between composable vs. cacheable.  
>> Can filters be cached without using bitsets?  What format are the results 
>> stored in, if not as bitsets?
>>
>> In the example below, would the string range field "y" filter be 
>> evaluated on every document in the index, or just on the documents matching 
>> the previous field "x" filter?
>>
>> Also, will "y" field data be loaded for all documents in the index, or 
>> just for the documents matching the previous filter.
>>
>>
>>
>> On Thursday, March 19, 2015 at 3:21:12 AM UTC-7, Jörg Prante wrote:
>>>
>>> There are several concepts:
>>>
>>> - filter operation (bool, range/geo/script)
>>> - filter composition (composable or not, composable means bitsets are 
>>> used)
>>> - filter caching (ES stores filter results or not, if not cached, ES 
>>> must walk doc-by-doc to apply filter)
>>>
>>> #1 says you should take care what kind of inner filter the and/or/not 
>>> filter uses, and then you should arrange filters in the right order to 
>>> avoid unnecessary complexity
>>> #2 most of the filters are cacheable, but not by default. These doc try 
>>> to explain how the "and" filter consists of inner filter clauses and what 
>>> is happening because default caching is off. I can not see this is implying 
>>> bitsets.
>>> #3 correct interpretation
>>>
>>> The use of bitsets is a pointer for composable filters, these 
>>> should/must/mustnot filters use an internal Lucene bitset implementation 
>>> for efficient computation. 
>>>
>>> Jörg
>>>
>>>
>>> On Thu, Mar 19, 2015 at 5:58 AM, Ashish Mishra <[email protected]> 
>>> wrote:
>>>
>>>> I'm trying to optimize filter queries for performance and am slightly 
>>>> confused by the online docs.  Looking at:
>>>>
>>>> 1) https://www.elastic.co/blog/all-about-elasticsearch-filter-bitsets
>>>> 2) http://www.elastic.co/guide/en/elasticsearch/reference/
>>>> current/query-dsl-and-filter.html
>>>> 3) http://www.elastic.co/guide/en/elasticsearch/guide/
>>>> current/_filter_order.html
>>>>
>>>> #1 says that Bool filter uses bitsets, while And/Or/Not does doc-by-doc 
>>>> matching.
>>>> #2 says that And result is optionally cacheable (implying that it uses 
>>>> bitsets).
>>>> #3 says that Bool does doc-by-doc matching if the inner filters are not 
>>>> cacheable.
>>>>
>>>> This is confusing, is there a clear guideline on when bitsets are used?
>>>>
>>>> Let's say I have two high-cardinality fields, x and y.  Field data for 
>>>> y is loaded into memory, while x is not.  What is the optimal way to 
>>>> structure this query?
>>>>
>>>>       "filter": {
>>>>         "and": [
>>>>         {
>>>>           "term": {
>>>>             "x": "F828477AF7",
>>>>     "_cache": false  // Don't want to cache since query will not be 
>>>> repeated
>>>>           }
>>>>     },
>>>> {
>>>>   "range": {
>>>>             "y": {
>>>>                 "gt": "CB70V63BD8AE  // String range query, should only 
>>>> be executed on result of previous filters
>>>>             }
>>>>           }
>>>>         }
>>>>         ]
>>>>       }
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/52dd306b-d229-462b-8b3c-b9cb2fff8c5f%
>>>> 40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/52dd306b-d229-462b-8b3c-b9cb2fff8c5f%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/0dbceece-5c74-4867-90df-951f8f0cae8a%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/0dbceece-5c74-4867-90df-951f8f0cae8a%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2a849fa9-2286-4e37-ac49-4d08a0202e3d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to