On Tue, Jul 15, 2014 at 3:25 PM, David Smith <[email protected]>
wrote:

> Thanks, Adrien. That brings me closer.
>
> So when the documentations say doc values do not support filtering, it's
> talking about fielddata filtering for what's loaded into memory (anod not
> filtering as part of a query... say term filter).
>

Exactly.


> For further clarification - can a field that is not analyzed and only kept
> as doc values be used for querying/filtering (say a term filter on a
> numeric field or match query on a string field)? Or do all
> querying/filtering required the field to be in the uninverted index?
>

Doc values play no role when filtering (except for some filters that
support a `fielddata` mode, such as the range filter[1]). So if your field
has `index: no` you cannot use it in filters, and if it has `index:
not_analyzed` then you can, no matter whether doc values are enabled or not.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-range-filter.html#_execution


> What I'm trying to understand how we can optimize querying/filtering in a
> large index (5 billion documents / 1 TB)? It's very hard to run a simple
> term filter because a bitset filter will need to be calculated that
> includes every single document. Wouldn't that utilize a lot of memory? Is
> there a way to speed that up?
>

If your filters are unlikely to be reused, then you should not cache them
by setting _cache to false. Caching filters only make filtering faster when
the likelyhood of reusing filters is high.

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6hE8CenTe9QfwWA5Rx45-mM%2BoOCSwPELOpsP_tKTGthA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to