Thanks for the input Jorg but millisecond granularity is 
application-critical here. I'm trying to work with the doc_values fielddata 
format for time. Currently I'm sending over ISO strings, and it's parsing 
them into dates internally, which I understand will stop me from using 
doc_values. I can turn this off though: are there any dire 
performance/workability implications from forcing it to store my dates as 
strings and not parse them into dates?



On Thursday, October 2, 2014 4:12:35 PM UTC-7, Jörg Prante wrote:
>
> If you really want to sort on a timestamp, use discretization strategy for 
> better performance.
>
> If you use millisecond resolution timestamp, ES will have to load all the 
> values of the fields, because they are unique. This is quite massive.
>
> But if you store year, month, day counts, hour counts, minute count etc. 
> in different fields to the resolution you want (e.g. seconds), you can 
> create fields with much less unique values.
>
> Then you can sort on multiple fields with very small memory consumption, 
> something like
>
> "sort": [
>         { "year":   { "order": "asc" }},
>         { "month": { "order": "asc" }},
>         { "day": { "order": "asc" }},
>         { "hour": { "order": "asc" }},
>         { "min": { "order": "asc" }},
>         { "sec": { "order": "asc" }}
>     ]
>
> Jörg
>
>
>
> On Thu, Oct 2, 2014 at 3:29 AM, Dave Galbraith <[email protected] 
> <javascript:>> wrote:
>
>> Hi! So I have millions and millions of documents in my Elasticsearch, 
>> each one of which has a field called "time". I need the results of my 
>> queries to come back in chronological order. So I put a 
>> "sort":{"time":{"order":"asc"}} in all my queries. This was going great 
>> on smaller data sets but then Elasticsearch started sending me 500s and 
>> circuit breaker exceptions started showing up in the logs with "data for 
>> field time would be too large". So I checked out 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html
>>  
>> and that looks a lot like what I've been seeing: seems like it's trying to 
>> pull all the millions of time values into memory even if they're not 
>> relevant to my query. What are my options for fixing this? I can't 
>> compromise chronological order, it's at the heart of my application. "More 
>> memory" would be a short-term fix but the idea is to scale this thing to 
>> trillions and trillions of points and that's a race I don't want to run. 
>> Can I make these exceptions go away without totally tanking performance? 
>> Thanks!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/60c63662-71b5-4e98-b125-995e357cd06e%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/60c63662-71b5-4e98-b125-995e357cd06e%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/65791347-1752-42cd-81e8-e1d05d479d79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to