I have varying data retention requirements I am trying to balance (I am continuously indexing new documents):
- 1% of my documents need to be kept forever - 10% need to be kept 1 year - the remainder needs to be kept for 1 month I can easily set properties indicating the retention policy for each document and then periodically do a "delete by query". However, since the delete would remove 89% of the indexed documents, would there be any potential performance problems with this straightforward approach? I guess this is a YMMV type thing, but I was just wondering what the typical approach is here. Would it be necessary to perhaps filter the query to not affect so many documents at once? Would query performance be greatly impacted? The alternate approach I was thinking would be to create separate indices for each retention type. Cleanup would be easier, but unfortunately a document's retention policy can be upgraded/downgraded so that could be a little messy to keep consistent. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/672e4d70-b9f9-4f6c-b22e-4287ef5a27ab%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
