Why not use TTL for document? http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-ttl-field.html
On Tuesday, April 1, 2014 8:50:14 AM UTC+11, slushi wrote: > > I have varying data retention requirements I am trying to balance (I am > continuously indexing new documents): > > - 1% of my documents need to be kept forever > - 10% need to be kept 1 year > - the remainder needs to be kept for 1 month > > I can easily set properties indicating the retention policy for each > document and then periodically do a "delete by query". However, since the > delete would remove 89% of the indexed documents, would there be any > potential performance problems with this straightforward approach? I guess > this is a YMMV type thing, but I was just wondering what the typical > approach is here. Would it be necessary to perhaps filter the query to not > affect so many documents at once? Would query performance be greatly > impacted? > > The alternate approach I was thinking would be to create separate indices > for each retention type. Cleanup would be easier, but unfortunately a > document's retention policy can be upgraded/downgraded so that could be a > little messy to keep consistent. > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eefba11c-d147-4e02-b84b-bc8f90a08e3f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
