Re: mass delete by query

slushi Mon, 31 Mar 2014 23:03:27 -0700

yes, unfortunately it’s not completely known at index time. I would need to 
keep the separate indices in sync when a retention policy change occurs. 
attempting this seems like it could open up a whole can of worms.


On Tuesday, April 1, 2014 1:58:04 AM UTC-4, David Pilato wrote:
>
> If you know in advance which doc should be removed (i mean at index time), 
> you should send the document to an index which should be entirely removed 
> after a given period.
>
>
> Makes sense?
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 1 avr. 2014 à 00:00, slushi <[email protected] <javascript:>> a 
> écrit :
>
> I attended an elastic search meet up and at some point it was mentioned 
> that TTL use is discouraged, but yes this would make a lot of sense here. 
> Also the 1 year thing is really a guesstimate, we want to keep as much of 
> that data as possible. I guess maybe with TTL you may not have as much 
> control when the document deletion and possible segment merging? I am not 
> that familiar with elastic search performance stuff yet (we just started 
> looking into using ES).
>
> On Monday, March 31, 2014 5:52:28 PM UTC-4, Kevin Wang wrote:
>>
>> Why not use TTL for document? 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-ttl-field.html
>>
>> On Tuesday, April 1, 2014 8:50:14 AM UTC+11, slushi wrote:
>>>
>>> I have varying data retention requirements I am trying to balance (I am 
>>> continuously indexing new documents):
>>>
>>>    - 1% of my documents need to be kept forever
>>>    - 10% need to be kept 1 year
>>>    - the remainder needs to be kept for 1 month
>>>    
>>> I can easily set properties indicating the retention policy for each 
>>> document and then periodically do a "delete by query". However, since the 
>>> delete would remove 89% of the indexed documents, would there be any 
>>> potential performance problems with this straightforward approach? I guess 
>>> this is a YMMV type thing, but I was just wondering what the typical 
>>> approach is here. Would it be necessary to perhaps filter the query to not 
>>> affect so many documents at once? Would query performance be greatly 
>>> impacted?
>>>
>>> The alternate approach I was thinking would be to create separate 
>>> indices for each retention type. Cleanup would be easier, but unfortunately 
>>> a document's retention policy can be upgraded/downgraded so that could be a 
>>> little messy to keep consistent.
>>>
>>>  -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/9b685cff-e956-473a-935e-9546b2ea59b3%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/9b685cff-e956-473a-935e-9546b2ea59b3%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eec089d7-0cef-4a9b-b53f-7dce55ad2bfd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: mass delete by query

Reply via email to