Thanks for the input Jorg but millisecond granularity is
application-critical here. I'm trying to work with the doc_values fielddata
format for time. Currently I'm sending over ISO strings, and it's parsing
them into dates internally, which I understand will stop me from using
doc_values. I can turn this off though: are there any dire
performance/workability implications from forcing it to store my dates as
strings and not parse them into dates?
On Thursday, October 2, 2014 4:12:35 PM UTC-7, Jörg Prante wrote:
>
> If you really want to sort on a timestamp, use discretization strategy for
> better performance.
>
> If you use millisecond resolution timestamp, ES will have to load all the
> values of the fields, because they are unique. This is quite massive.
>
> But if you store year, month, day counts, hour counts, minute count etc.
> in different fields to the resolution you want (e.g. seconds), you can
> create fields with much less unique values.
>
> Then you can sort on multiple fields with very small memory consumption,
> something like
>
> "sort": [
> { "year": { "order": "asc" }},
> { "month": { "order": "asc" }},
> { "day": { "order": "asc" }},
> { "hour": { "order": "asc" }},
> { "min": { "order": "asc" }},
> { "sec": { "order": "asc" }}
> ]
>
> Jörg
>
>
>
> On Thu, Oct 2, 2014 at 3:29 AM, Dave Galbraith <[email protected]
> <javascript:>> wrote:
>
>> Hi! So I have millions and millions of documents in my Elasticsearch,
>> each one of which has a field called "time". I need the results of my
>> queries to come back in chronological order. So I put a
>> "sort":{"time":{"order":"asc"}} in all my queries. This was going great
>> on smaller data sets but then Elasticsearch started sending me 500s and
>> circuit breaker exceptions started showing up in the logs with "data for
>> field time would be too large". So I checked out
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html
>>
>> and that looks a lot like what I've been seeing: seems like it's trying to
>> pull all the millions of time values into memory even if they're not
>> relevant to my query. What are my options for fixing this? I can't
>> compromise chronological order, it's at the heart of my application. "More
>> memory" would be a short-term fix but the idea is to scale this thing to
>> trillions and trillions of points and that's a race I don't want to run.
>> Can I make these exceptions go away without totally tanking performance?
>> Thanks!
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/60c63662-71b5-4e98-b125-995e357cd06e%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/elasticsearch/60c63662-71b5-4e98-b125-995e357cd06e%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/65791347-1752-42cd-81e8-e1d05d479d79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.