> You are talking about durability, not HA. Good point, thanks. I meant HA for the data, but data durability makes more sense.
> To have a better understanding I recommend reading our architecture > page http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture and the > Bigtable paper. Thanks, I've been studying that today. > In short, when you write a row it goes into the write-ahead-log and > then right after that in MemStore. Once the MemStore is full (64MB) or > for some other reasons, it is flushed to disk where the file is > replicated (transparently). Each RegionStore has its own WAL, yes? From the Architecture page: When a write request is received, it is first written to a write-ahead log called a HLog. All write requests for every region the region server is serving are written to the same log. Once the request has been written to the HLog, it is stored in an in-memory cache called the Memcache. There is one Memcache for each HStore. Which confuses me, if the write goes straight to a RegionServer, but then the RegionServer fails before the MemStore is flushed, did I just lose data? > If the node fails, the Master will process the WAL so that you don't So do all writes go through the Master? Clearly I'm a bit confused here :) > lose rows in the MemStore. Prior to Hadoop 0.21 (unreleased), the Moral of the story is to upgrade to 0.21 ASAP. :) Thanks! Seth
