Re: Zero Downtime Reindexing

Jim Abramson Thu, 01 May 2014 13:52:37 -0700

Agree with the original poster that none of the existing solutions are 
ideal.  Making it simpler and safer to roll out revised mappings would be a 
huge win if your use case involves incremental revisions/refinements to 
your indexing strategies.  A lossless solution would especially benefit the 
case where ES is being used as the primary data source (an option we have 
been considering), since you really don't want to drop a record in that 
case.



On Monday, February 24, 2014 9:20:56 AM UTC-5, JoeZ99 wrote:
>
> How about, while the scan is being done, let updates go to the old index 
> but with an extra field? Once the alias points to the new index, it's just 
> a query to fetch the fields with that new field from the old index and then 
> reindex then into the new one. If the alias changing/new index creation is 
> unsuccessful , then update old index to remove that new field.
>
> On Friday, February 21, 2014 3:11:52 AM UTC-5, Andrew Kane wrote:
>>
>> I tried to post a reply yesterday but it looks like it never made it.
>>
>> Thank you all for the quick replies.  Here's a slightly better 
>> explanation of where I believe the race condition occurs.
>>
>> When the scan/scroll starts, the alias is still pointing to the old 
>> index, so updates go to the old index.  Let's say you update Document 1.  If 
>> the scroll/scan has already passed Document 1, the new index never sees the 
>> update.  The three solutions you mentioned Nik are to either:
>>
>> 1. Keep track of updates manually [tedious]
>> 2. Pause the jobs that perform the updates [out of sync]
>> 3. Send updates to both indexes [also tedious]
>>
>> However, none of these seem ideal.
>>
>> - Andrew
>>
>> On Tuesday, February 18, 2014 8:41:18 PM UTC-8, Andrew Kane wrote:
>>>
>>> Hi,
>>>
>>> I've followed the documentation for zero-downtime mapping changes and it 
>>> works great.  
>>> http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/
>>>
>>> However, there is a (pretty big) race condition with this approach - 
>>> while reindexing, changes may not make it to the new index.  I've looked 
>>> all over and haven't found a single solution to address this.  The best 
>>> attempt I've seen is to buffer updates, but this is tedious and still 
>>> leaves a race condition (with a smaller window).  My initial thoughts were 
>>> to create a write alias that points to the old and new indices and use 
>>> versioning.  However, there is no way to write to multiple indices 
>>> atomically.
>>>
>>> It seems like this issue should affect most Elasticsearch users (whether 
>>> they realize it or not).  Does anyone have a good solution to this?
>>>
>>> Thanks,
>>> Andrew
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/670c8443-3706-4dd0-a57d-d2e9fcac9ce1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Zero Downtime Reindexing

Reply via email to