Re: HBase region server failure issues

Jonathan Hsieh Tue, 15 Apr 2014 08:16:59 -0700

It makes sense to have as many wals as # of spindles / replication factor
per machine.  This should be decoupled from the number of regions on a
region server.  So for a cluster with 12 spindles we should likely have at
least 4 wals (12 spindles / 3 replication factor), and need to do
experiments to see if going to 8 or some higher number makes sense (new wal
uses a disruptor pattern which avoids much contention on individual
writes).   So with your example, your 1000 regions would get sharded into
the 4 wals which would maximize io throughput, disk utilization, and reduce
time for recovery in the face of failure.

In the case of an SSD world, it makes more sense to have one wal per node
once we have decent HSM support in HDFS.  The key win here will be in
recovery time -- if any RS goes down we only have to replay a regions edits
and not have to split or demux different region's edits.

Jon.

On Mon, Apr 14, 2014 at 10:37 PM, Vladimir Rodionov
<[email protected]>wrote:

> Todd, how about 300 regions with 3x replication?  Or 1000 regions? This is
> going to be 3000 files. on HDFS. per one RS. When I said that it does not
> scale, I meant that exactly that.
>

-- 
// Jonathan Hsieh (shay)
// HBase Tech Lead, Software Engineer, Cloudera
// [email protected] // @jmhsieh

Re: HBase region server failure issues

Reply via email to