Re: HBase High Availability

Jean-Daniel Cryans Wed, 25 Nov 2009 21:05:44 -0800

Your question implies a lot of other considerations so I'll answer
strictly to it. By default we retry 10 times to get access to data
with pauses of increments of 2 seconds in the same fashion of the BSD
TCP syn backoff table.


The configs are:

  <property>
    <name>hbase.client.pause</name>
    <value>2000</value>
    <description>General client pause value.  Used mostly as value to wait
    before running a retry of a failed get, region lookup, etc.</description>
  </property>
  <property>
    <name>hbase.client.retries.number</name>
    <value>10</value>
    <description>Maximum retries.  Used as maximum for all retryable
    operations such as fetching of the root region from root region
    server, getting a cell's value, starting a row update, etc.
    Default: 10.
    </description>
  </property>

And the backoff table is (from HConstants):

  public static int RETRY_BACKOFF[] = { 1, 1, 1, 2, 2, 4, 4, 8, 16, 32 };


So by default we wait 2, 2, 2, 4, 4, 8, (...) seconds before throwing
a RetriesExhaustedException. With hbase.client.retries.number=1, you
just wait 2 seconds but I wouldn't recommend such a small number since
any region split takes a least 6 seconds in 0.20.

Be also aware that we are planning to include a master-slave
replication between datacenters in 0.21.

J-D

On Wed, Nov 25, 2009 at 8:45 PM, Murali Krishna. P
<muralikpb...@yahoo.com> wrote:
> Thanks JD for the detailed reply.
>
> Does the underlying java api currently block in case if region is not 
> available ? I would like to get an immediate retry indication for the java 
> call in such cases so that I can redirect the request to the duplicate table 
> in the other data center. Can this be supported?
>
>  Thanks,
> Murali Krishna
>
>
>
>
> ________________________________
> From: Andrew Purtell <apurt...@apache.org>
> To: hbase-user@hadoop.apache.org
> Sent: Thu, 26 November, 2009 12:17:30 AM
> Subject: Re: HBase High Availability
>
> First, there is work under way for 0.21 which will shorten the time necessary 
> for region redeployment. Part of the delay in 0.20 is less than ideal 
> performance in that regard by the master.
>
> Beyond that, just as a general operational principle, I recommend that you 
> host no more than 200-250 regions per region server. The Bigtable paper talks 
> about each tablet server hosting only 100 regions, with only 200 MB of data 
> each. While that is not cost effective for folks who do not build their own 
> hardware in bulk, it should cause you to think about why:
>   - Limiting the number of regions per tablet server limits time to recovery 
> upon node failure -- you can engineer this to be within some threshold
>   - Limiting the amount of data per region means that servers with reasonable 
> RAM can cache and serve a lot of the data out of memory for sub-disk data 
> access latencies
>
> So the advice here is to opt for more servers, not less; more RAM, not less; 
> and smaller disk, not larger.
>
> You should also consider the impact of server failure on HDFS -- loss of 
> block replicas. For each under-replicated block, HDFS must work to make 
> additional copies. This can come at a bad time if loss of the blocks in the 
> first place was due to overloading.
> Smaller disks mean fewer lost block replicas. For example, attach 4 x 160 GB 
> drives as JBOD (as opposed to 4 x 1 TB or similar). Losing one disk means a 
> loss of 160 GB worth of block replicas only (as opposed to 1 TB). Loss of a 
> whole server means losing only 640 GB worth of block replicas (as opposed to 
> 4 TB).
> You can also consider attaching 6 or 8 or even more modest sized disks per 
> server to increase the I/O parallelism (number of spindles) while also 
> constraining the amount of block replica loss per disk failure.
>
> Even so, blocked reads and writes over some interval during region 
> redeployment due to server failure or load rebalancing is part of the 
> Bigtable architecture and so HBase, unless we take additional steps such as 
> setting up active-passive region server pairs, but that would have 
> complications which affect consistency and performance and might not provide 
> enough benefit anyway (still there is time needed to detect failure and fall 
> over). This is not an unavailability of the Bigtable service. Other regions 
> are not affected. This is graceful/proportional service degradation in the 
> face of partial failures. There are other alternatives to Bigtable which 
> degrade differently given partial failures. Such options can give you no 
> waiting on the write path at any time and possibly no waiting on the read 
> path but you will lose strong consistency as the trade off. So you may get 
> stale answers over some (unbounded, iirc) period, but this is the choice you 
> make.
>
> HBase also has options like Stargate or the Thrift connector which can block 
> and retry on behalf of your clients so they are never blocked for writes. For 
> read path options I could look at having Stargate serve (possibly stale) 
> answers out of a cache -- with some flag that indicates noncanonical state -- 
> if that would be useful, and/or return immediate "try again" indication, so 
> your clients are at least not stalled.
>
> Best regards,
>
>  - Andy
>
>
>
>
> ________________________________
> From: Murali Krishna. P <muralikpb...@yahoo.com>
> To: hbase-user@hadoop.apache.org
> Sent: Wed, November 25, 2009 1:31:45 AM
> Subject: HBase High Availability
>
> Hi,
>    This is regarding the region unavailability when a region server goes 
> down. There will be cases where we have thousands of regions per RS and it 
> takes considerable amount of time to redistribute the regions when a node 
> fails. The service will be unavailable during that period. I am evaluating 
> HBase for an application where we need to guarantee close to 100% 
> availability (namenode is still SPOF, leave that).
>
>    One simple idea would be to replicate the regions in memory. Can we load 
> the same region in multiple region servers? I am not sure about the 
> feasibility yet, there will be issues like consistency across these in memory 
> replicas. Wanted to know whether there were any thoughts / work already going 
> on this area? I saw some related discussion here 
> http://osdir.com/ml/hbase-user-hadoop-apache/2009-09/msg00118.html, not sure 
> what is the state.
>
>  Same needs to be done with the master as well or is it already done with ZK? 
> How fast is the master re-election and catalog load currently ? Do we always 
> have multiple masters in ready to run state?
>
>
> Thanks,
> Murali Krishna

Re: HBase High Availability

Reply via email to