I have varying data retention requirements I am trying to balance (I am 
continuously indexing new documents):

   - 1% of my documents need to be kept forever
   - 10% need to be kept 1 year
   - the remainder needs to be kept for 1 month
   
I can easily set properties indicating the retention policy for each 
document and then periodically do a "delete by query". However, since the 
delete would remove 89% of the indexed documents, would there be any 
potential performance problems with this straightforward approach? I guess 
this is a YMMV type thing, but I was just wondering what the typical 
approach is here. Would it be necessary to perhaps filter the query to not 
affect so many documents at once? Would query performance be greatly 
impacted?

The alternate approach I was thinking would be to create separate indices 
for each retention type. Cleanup would be easier, but unfortunately a 
document's retention policy can be upgraded/downgraded so that could be a 
little messy to keep consistent.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/672e4d70-b9f9-4f6c-b22e-4287ef5a27ab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to