In your use case, could the retention policy change for 89% document? If not, I would create one index for documents which could have a moving retention policy and use _ttl. For monthly docs, I would use an index per month.
If it's not the case, I think you should deal with _ttl with a cost of higher merges. My 2 cents. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 1 avr. 2014 à 08:03, slushi <[email protected]> a écrit : yes, unfortunately it’s not completely known at index time. I would need to keep the separate indices in sync when a retention policy change occurs. attempting this seems like it could open up a whole can of worms. > On Tuesday, April 1, 2014 1:58:04 AM UTC-4, David Pilato wrote: > If you know in advance which doc should be removed (i mean at index time), > you should send the document to an index which should be entirely removed > after a given period. > > > Makes sense? > > -- > David ;-) > Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs > > > Le 1 avr. 2014 à 00:00, slushi <[email protected]> a écrit : > > I attended an elastic search meet up and at some point it was mentioned that > TTL use is discouraged, but yes this would make a lot of sense here. Also the > 1 year thing is really a guesstimate, we want to keep as much of that data as > possible. I guess maybe with TTL you may not have as much control when the > document deletion and possible segment merging? I am not that familiar with > elastic search performance stuff yet (we just started looking into using ES). > >> On Monday, March 31, 2014 5:52:28 PM UTC-4, Kevin Wang wrote: >> Why not use TTL for document? >> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-ttl-field.html >> >>> On Tuesday, April 1, 2014 8:50:14 AM UTC+11, slushi wrote: >>> I have varying data retention requirements I am trying to balance (I am >>> continuously indexing new documents): >>> 1% of my documents need to be kept forever >>> 10% need to be kept 1 year >>> the remainder needs to be kept for 1 month >>> I can easily set properties indicating the retention policy for each >>> document and then periodically do a "delete by query". However, since the >>> delete would remove 89% of the indexed documents, would there be any >>> potential performance problems with this straightforward approach? I guess >>> this is a YMMV type thing, but I was just wondering what the typical >>> approach is here. Would it be necessary to perhaps filter the query to not >>> affect so many documents at once? Would query performance be greatly >>> impacted? >>> >>> The alternate approach I was thinking would be to create separate indices >>> for each retention type. Cleanup would be easier, but unfortunately a >>> document's retention policy can be upgraded/downgraded so that could be a >>> little messy to keep consistent. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/9b685cff-e956-473a-935e-9546b2ea59b3%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eec089d7-0cef-4a9b-b53f-7dce55ad2bfd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/80DF8D6F-E0E8-46F3-BA7D-0D76D1B11E45%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
