Thanks for the clarification - this was very helpful.
On Friday, March 20, 2015 at 5:45:59 AM UTC-7, Jörg Prante wrote: > > Caching filters are implemented in ES, not in Lucene. E.g. > org.elasticsearch,common.lucene.search.CachedFilter is a class that > implements cached filters on the base of Lucene filter class. > > The "format" is not only bitsets. The Lucene filter instance is cached, no > matter if it is doc sets or bit sets or whatever. ES code extends Lucene > filters by several methods for fast evaluation and traversal. > > ES evaluates the filter in the given filter chain order, from outer to > inner (also called "top down"). > > When a series of boolean filters (i.e. should/must/must_not) is used, they > can be evaluated efficiently by composition. See > org.elasticsearch,common.lucene.search.XBooleanFilter for the composition > algorithm. > > Field data will be loaded when a field is used for operations like filter > or sort. The higher the cardinality, the more effort is needed. This is > because the index is inverted. > > Jörg > > > On Fri, Mar 20, 2015 at 3:30 AM, Ashish Mishra <[email protected] > <javascript:>> wrote: > >> Not sure I understand the difference between composable vs. cacheable. >> Can filters be cached without using bitsets? What format are the results >> stored in, if not as bitsets? >> >> In the example below, would the string range field "y" filter be >> evaluated on every document in the index, or just on the documents matching >> the previous field "x" filter? >> >> Also, will "y" field data be loaded for all documents in the index, or >> just for the documents matching the previous filter. >> >> >> >> On Thursday, March 19, 2015 at 3:21:12 AM UTC-7, Jörg Prante wrote: >>> >>> There are several concepts: >>> >>> - filter operation (bool, range/geo/script) >>> - filter composition (composable or not, composable means bitsets are >>> used) >>> - filter caching (ES stores filter results or not, if not cached, ES >>> must walk doc-by-doc to apply filter) >>> >>> #1 says you should take care what kind of inner filter the and/or/not >>> filter uses, and then you should arrange filters in the right order to >>> avoid unnecessary complexity >>> #2 most of the filters are cacheable, but not by default. These doc try >>> to explain how the "and" filter consists of inner filter clauses and what >>> is happening because default caching is off. I can not see this is implying >>> bitsets. >>> #3 correct interpretation >>> >>> The use of bitsets is a pointer for composable filters, these >>> should/must/mustnot filters use an internal Lucene bitset implementation >>> for efficient computation. >>> >>> Jörg >>> >>> >>> On Thu, Mar 19, 2015 at 5:58 AM, Ashish Mishra <[email protected]> >>> wrote: >>> >>>> I'm trying to optimize filter queries for performance and am slightly >>>> confused by the online docs. Looking at: >>>> >>>> 1) https://www.elastic.co/blog/all-about-elasticsearch-filter-bitsets >>>> 2) http://www.elastic.co/guide/en/elasticsearch/reference/ >>>> current/query-dsl-and-filter.html >>>> 3) http://www.elastic.co/guide/en/elasticsearch/guide/ >>>> current/_filter_order.html >>>> >>>> #1 says that Bool filter uses bitsets, while And/Or/Not does doc-by-doc >>>> matching. >>>> #2 says that And result is optionally cacheable (implying that it uses >>>> bitsets). >>>> #3 says that Bool does doc-by-doc matching if the inner filters are not >>>> cacheable. >>>> >>>> This is confusing, is there a clear guideline on when bitsets are used? >>>> >>>> Let's say I have two high-cardinality fields, x and y. Field data for >>>> y is loaded into memory, while x is not. What is the optimal way to >>>> structure this query? >>>> >>>> "filter": { >>>> "and": [ >>>> { >>>> "term": { >>>> "x": "F828477AF7", >>>> "_cache": false // Don't want to cache since query will not be >>>> repeated >>>> } >>>> }, >>>> { >>>> "range": { >>>> "y": { >>>> "gt": "CB70V63BD8AE // String range query, should only >>>> be executed on result of previous filters >>>> } >>>> } >>>> } >>>> ] >>>> } >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/52dd306b-d229-462b-8b3c-b9cb2fff8c5f% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/52dd306b-d229-462b-8b3c-b9cb2fff8c5f%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/0dbceece-5c74-4867-90df-951f8f0cae8a%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/0dbceece-5c74-4867-90df-951f8f0cae8a%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2a849fa9-2286-4e37-ac49-4d08a0202e3d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
