Actually, the date section of
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html
seems to suggest that doc_values is valid for dates, even though they
aren't mentioned in
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/fielddata-formats.html.
Maybe I can get away with not disabling conversion...
On Thursday, October 2, 2014 5:28:23 PM UTC-7, Dave Galbraith wrote:
>
> Thanks for the input Jorg but millisecond granularity is
> application-critical here. I'm trying to work with the doc_values fielddata
> format for time. Currently I'm sending over ISO strings, and it's parsing
> them into dates internally, which I understand will stop me from using
> doc_values. I can turn this off though: are there any dire
> performance/workability implications from forcing it to store my dates as
> strings and not parse them into dates?
>
>
>
> On Thursday, October 2, 2014 4:12:35 PM UTC-7, Jörg Prante wrote:
>>
>> If you really want to sort on a timestamp, use discretization strategy
>> for better performance.
>>
>> If you use millisecond resolution timestamp, ES will have to load all the
>> values of the fields, because they are unique. This is quite massive.
>>
>> But if you store year, month, day counts, hour counts, minute count etc.
>> in different fields to the resolution you want (e.g. seconds), you can
>> create fields with much less unique values.
>>
>> Then you can sort on multiple fields with very small memory consumption,
>> something like
>>
>> "sort": [
>> { "year": { "order": "asc" }},
>> { "month": { "order": "asc" }},
>> { "day": { "order": "asc" }},
>> { "hour": { "order": "asc" }},
>> { "min": { "order": "asc" }},
>> { "sec": { "order": "asc" }}
>> ]
>>
>> Jörg
>>
>>
>>
>> On Thu, Oct 2, 2014 at 3:29 AM, Dave Galbraith <[email protected]>
>> wrote:
>>
>>> Hi! So I have millions and millions of documents in my Elasticsearch,
>>> each one of which has a field called "time". I need the results of my
>>> queries to come back in chronological order. So I put a
>>> "sort":{"time":{"order":"asc"}} in all my queries. This was going great
>>> on smaller data sets but then Elasticsearch started sending me 500s and
>>> circuit breaker exceptions started showing up in the logs with "data for
>>> field time would be too large". So I checked out
>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html
>>>
>>> and that looks a lot like what I've been seeing: seems like it's trying to
>>> pull all the millions of time values into memory even if they're not
>>> relevant to my query. What are my options for fixing this? I can't
>>> compromise chronological order, it's at the heart of my application. "More
>>> memory" would be a short-term fix but the idea is to scale this thing to
>>> trillions and trillions of points and that's a race I don't want to run.
>>> Can I make these exceptions go away without totally tanking performance?
>>> Thanks!
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/60c63662-71b5-4e98-b125-995e357cd06e%40googlegroups.com
>>>
>>> <https://groups.google.com/d/msgid/elasticsearch/60c63662-71b5-4e98-b125-995e357cd06e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0c28ae08-94ae-4b19-83d5-e8aa0e7fb63a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.