In your use case, could the retention policy change for 89% document?
If not, I would create one index for documents which could have a moving 
retention policy and use _ttl. For monthly docs, I would use an index per month.

If it's not the case, I think you should deal with _ttl with a cost of higher 
merges.


My 2 cents.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 1 avr. 2014 à 08:03, slushi <[email protected]> a écrit :

yes, unfortunately it’s not completely known at index time. I would need to 
keep the separate indices in sync when a retention policy change occurs. 
attempting this seems like it could open up a whole can of worms. 

> On Tuesday, April 1, 2014 1:58:04 AM UTC-4, David Pilato wrote:
> If you know in advance which doc should be removed (i mean at index time), 
> you should send the document to an index which should be entirely removed 
> after a given period.
> 
> 
> Makes sense?
> 
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
> 
> 
> Le 1 avr. 2014 à 00:00, slushi <[email protected]> a écrit :
> 
> I attended an elastic search meet up and at some point it was mentioned that 
> TTL use is discouraged, but yes this would make a lot of sense here. Also the 
> 1 year thing is really a guesstimate, we want to keep as much of that data as 
> possible. I guess maybe with TTL you may not have as much control when the 
> document deletion and possible segment merging? I am not that familiar with 
> elastic search performance stuff yet (we just started looking into using ES).
> 
>> On Monday, March 31, 2014 5:52:28 PM UTC-4, Kevin Wang wrote:
>> Why not use TTL for document? 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-ttl-field.html
>> 
>>> On Tuesday, April 1, 2014 8:50:14 AM UTC+11, slushi wrote:
>>> I have varying data retention requirements I am trying to balance (I am 
>>> continuously indexing new documents):
>>> 1% of my documents need to be kept forever
>>> 10% need to be kept 1 year
>>> the remainder needs to be kept for 1 month
>>> I can easily set properties indicating the retention policy for each 
>>> document and then periodically do a "delete by query". However, since the 
>>> delete would remove 89% of the indexed documents, would there be any 
>>> potential performance problems with this straightforward approach? I guess 
>>> this is a YMMV type thing, but I was just wondering what the typical 
>>> approach is here. Would it be necessary to perhaps filter the query to not 
>>> affect so many documents at once? Would query performance be greatly 
>>> impacted?
>>> 
>>> The alternate approach I was thinking would be to create separate indices 
>>> for each retention type. Cleanup would be easier, but unfortunately a 
>>> document's retention policy can be upgraded/downgraded so that could be a 
>>> little messy to keep consistent.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/9b685cff-e956-473a-935e-9546b2ea59b3%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eec089d7-0cef-4a9b-b53f-7dce55ad2bfd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/80DF8D6F-E0E8-46F3-BA7D-0D76D1B11E45%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Reply via email to