This sounds hugely useful to me and is one of those "why doesn't HBase have that" things that bugged me.
Is there an issue to watch? http://search-hadoop.com/?q=region+failover+secondary&fc_project=HBase&fc_type=issuedoesn't find any. Thanks, Otis -- HBASE Performance Monitoring - http://sematext.com/spm/index.html On Mon, Jan 21, 2013 at 7:55 PM, Jonathan Hsieh <[email protected]> wrote: > The main motivation is to maintain good performance on RS failovers. > This is also tied with hdfs and its block placement policy. Let me > explain as I understand it. If we control the hdfs block placement > strategy we can write all blocks for a hfile (or for all hfiles > related to a region) to the same set of data nodes. If the RS fails, > they favor failover to a node that has a local copy of all the blocks. > > Today, when you write an hfile to hdfs, for each block the first > replica goes to the local data node but the others get disbursed > around the cluster randomly at a per block granularity. The problem > here is that if the rs fails, the new rs that gets the responsibility > for the region has to read files that are spread all over the cluster > and with roughly 1/nth of the data local. This means that the > recovered region is slower until a compaction localizes the data gain. > > They've gone in and modified hdfs and their hbase to take advantage of > this idea. I believe the randomization policy is enforced per region > -- if an rs serves 25 region, all the files within a each region are > sent to the same set of secondary/tertiary nodes, but each region > sends to a different set of secondary/tertiary nodes. > > Jon. > > > On Mon, Jan 21, 2013 at 3:48 PM, Devaraj Das <[email protected]> wrote: > > In 0.89-fb branch I stumbled upon stuff that indicated that there is a > > concept of secondary and tertiary regionserver. Could someone with > > more insights please shed some light on this? > > Might be useful to do the analysis on whether it makes sense for trunk.. > > Thanks > > Devaraj > > > > -- > // Jonathan Hsieh (shay) > // Software Engineer, Cloudera > // [email protected] >
