If you have replica level 1 with 3 nodes, this is not enough. You must set replica level 2. With replica level 1 and outage of 2 nodes, as you describe, you will lose data.
Jörg On Wednesday, October 15, 2014 12:52:31 AM UTC+2, Evan Tahler wrote: > > Hi Mailing List! I'm a first-time poster, and a long time reader. > > We recently had a crash in our ES (1.3.1 on Ubuntu) cluster which caused > us to loose a significant volume of data. I have a "theory" on what > happened to cause this, and I would love to hear your opinions on this, and > if you have any suggestions to mitigate it. > > Here is a simplified play-by-play: > > > 1. Cluster has 3 data nodes, A, B, and C. The index has 10 shards. > The index has a replica count of 1, so A is the master and B is a > replica. > C is doing nothing. Re-allocation of indexes/shards is enabled. > 2. A crashes. B takes over as master, and then starts transferring > data to C as a new replica. > 3. B crashes. C is now master with an impartial dataset. > 4. There is a write to the index. > 5. A and B finally reboot, and they are told that they are now stale > (as C had a write while they were away). Both A and B delete their local > data. A is chosen to be the new replica and re-sync from C. > 6. ... all the data A and B had which C never got is lost forever. > > > Is the above situation scenario possible? If it is, it seems like the > default behavior of ES might be better to not reallocate in this scenario? > This would have caused the write in step #4 to fail, but in our use case, > that is preferable to data loss. > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f7341384-4c88-4e10-a731-f1e6792d6bdd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
