Actually, the date section of 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html
 
seems to suggest that doc_values is valid for dates, even though they 
aren't mentioned in 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/fielddata-formats.html.
 
Maybe I can get away with not disabling conversion...



On Thursday, October 2, 2014 5:28:23 PM UTC-7, Dave Galbraith wrote:
>
> Thanks for the input Jorg but millisecond granularity is 
> application-critical here. I'm trying to work with the doc_values fielddata 
> format for time. Currently I'm sending over ISO strings, and it's parsing 
> them into dates internally, which I understand will stop me from using 
> doc_values. I can turn this off though: are there any dire 
> performance/workability implications from forcing it to store my dates as 
> strings and not parse them into dates?
>
>
>
> On Thursday, October 2, 2014 4:12:35 PM UTC-7, Jörg Prante wrote:
>>
>> If you really want to sort on a timestamp, use discretization strategy 
>> for better performance.
>>
>> If you use millisecond resolution timestamp, ES will have to load all the 
>> values of the fields, because they are unique. This is quite massive.
>>
>> But if you store year, month, day counts, hour counts, minute count etc. 
>> in different fields to the resolution you want (e.g. seconds), you can 
>> create fields with much less unique values.
>>
>> Then you can sort on multiple fields with very small memory consumption, 
>> something like
>>
>> "sort": [
>>         { "year":   { "order": "asc" }},
>>         { "month": { "order": "asc" }},
>>         { "day": { "order": "asc" }},
>>         { "hour": { "order": "asc" }},
>>         { "min": { "order": "asc" }},
>>         { "sec": { "order": "asc" }}
>>     ]
>>
>> Jörg
>>
>>
>>
>> On Thu, Oct 2, 2014 at 3:29 AM, Dave Galbraith <[email protected]> 
>> wrote:
>>
>>> Hi! So I have millions and millions of documents in my Elasticsearch, 
>>> each one of which has a field called "time". I need the results of my 
>>> queries to come back in chronological order. So I put a 
>>> "sort":{"time":{"order":"asc"}} in all my queries. This was going great 
>>> on smaller data sets but then Elasticsearch started sending me 500s and 
>>> circuit breaker exceptions started showing up in the logs with "data for 
>>> field time would be too large". So I checked out 
>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html
>>>  
>>> and that looks a lot like what I've been seeing: seems like it's trying to 
>>> pull all the millions of time values into memory even if they're not 
>>> relevant to my query. What are my options for fixing this? I can't 
>>> compromise chronological order, it's at the heart of my application. "More 
>>> memory" would be a short-term fix but the idea is to scale this thing to 
>>> trillions and trillions of points and that's a race I don't want to run. 
>>> Can I make these exceptions go away without totally tanking performance? 
>>> Thanks!
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/60c63662-71b5-4e98-b125-995e357cd06e%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/60c63662-71b5-4e98-b125-995e357cd06e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0c28ae08-94ae-4b19-83d5-e8aa0e7fb63a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to