Hi Eugene, Thanks for your comments - I'll do my best to explain where I am coming from, and to address some of the issues you have raised.
Firstly, where I'm coming from: the data I'm holding and searching against needs to be 100% backed up because it needs to be audited in the future. For that reason the data is held on an old fashioned multi-master replicated relational DB. In terms of the issues you raised: 1) But how this is different from any other DB? i) With relational DBs it is part of the strategy to replay the transaction logs to make up for any data that hasn't been backed up. I've heard of people doing this with ES, but it is not documented well anywhere, additionally the transaction logs, to my limited understanding, are kept in the same area as the index files and can suffer corruption. I think there may be some monitoring in version 1.0 to stop ES writing to disk before the files become corrupted, which would help. But the first point, that there is no clear transaction log replay strategy outlined for elasticsearch. ii) Multi-master replication - no doubt its possible to arrange JMS queues or hazelcast/coherence grids to do this - but a build in solution would be useful. 2) Examples of data loss - upgrading elasticsearch versions, I've ended up losing all data, no doubt through my own fault, and maybe I'd have been more careful, and read upgrade instructions more carefully if I'd have know that my data was not backed up in the relational database, but it is definitely something that plays on my mind: "If I screw up this upgrade process, or misunderstand the upgrade process then that's it my data is gone" So, I would probably add the following, although I could be wrong, because I have not read every blog relating to ES upgrades: 1) But how this is different from any other DB? iii) There is no clear, consistent, well documented process of upgrading elasticsearch versions, particularly when the underlying Lucene version changes. David. On Tuesday, 14 January 2014 20:13:22 UTC, Eugene Strokin wrote: > > You are correct. But how this is different from any other DB? > I guess the question is more like: if I'm running ES under normal > conditions, could index get corrupted? > If this is hardware issue, and you have replication switched on, then you > wouldn't get affected much. Your system will continue functioning but state > would become yellow. You'd need to replase the node and this is it. > Some people claimed, that they expirienced sudden index corruption with > data loss. I myself nether saw anything like this. Even though I had done > few times stupid things, and had near hart stroke feelings but data wasn't > lost at the end, and again I have nothing to blame but myself. > > Regarding stability I could say that ES has not gave us any problems. I > was performing such things with success on production envirement with zero > downtime: > - adding nodes and replication > - transitioning data to another data center > - adding more clients > Etc... > > I'd really like to hear from people who expirienced data loss. If someone > would provide details this would help us to understand that was wrong and > what we should avoid doing. > But becides claims that there are such cases, I didn't hear anything else. > > Eugene > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5cff97f3-9541-4cba-a3c2-be0d8ad4440d%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
