On Tue, Jul 15, 2014 at 3:25 PM, David Smith <[email protected]> wrote:
> Thanks, Adrien. That brings me closer. > > So when the documentations say doc values do not support filtering, it's > talking about fielddata filtering for what's loaded into memory (anod not > filtering as part of a query... say term filter). > Exactly. > For further clarification - can a field that is not analyzed and only kept > as doc values be used for querying/filtering (say a term filter on a > numeric field or match query on a string field)? Or do all > querying/filtering required the field to be in the uninverted index? > Doc values play no role when filtering (except for some filters that support a `fielddata` mode, such as the range filter[1]). So if your field has `index: no` you cannot use it in filters, and if it has `index: not_analyzed` then you can, no matter whether doc values are enabled or not. [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-range-filter.html#_execution > What I'm trying to understand how we can optimize querying/filtering in a > large index (5 billion documents / 1 TB)? It's very hard to run a simple > term filter because a bitset filter will need to be calculated that > includes every single document. Wouldn't that utilize a lot of memory? Is > there a way to speed that up? > If your filters are unlikely to be reused, then you should not cache them by setting _cache to false. Caching filters only make filtering faster when the likelyhood of reusing filters is high. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6hE8CenTe9QfwWA5Rx45-mM%2BoOCSwPELOpsP_tKTGthA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
