Re: HBase High Availability

Imran M Yousuf Thu, 26 Nov 2009 03:16:14 -0800

On Thu, Nov 26, 2009 at 12:05 PM, Jean-Daniel Cryans
<jdcry...@apache.org> wrote:
<snip />
>
> Be also aware that we are planning to include a master-slave
> replication between datacenters in 0.21.
>


>From this discussion and a presentation of Ryan Rawson and Jonathan
Gray I am really looking forward to release 0.21, any idea on the
timeline?

- Imran

> J-D
>
> On Wed, Nov 25, 2009 at 8:45 PM, Murali Krishna. P
> <muralikpb...@yahoo.com> wrote:
>> Thanks JD for the detailed reply.
>>
>> Does the underlying java api currently block in case if region is not 
>> available ? I would like to get an immediate retry indication for the java 
>> call in such cases so that I can redirect the request to the duplicate table 
>> in the other data center. Can this be supported?
>>
>>  Thanks,
>> Murali Krishna
>>
>>
>>
>>
>> ________________________________
>> From: Andrew Purtell <apurt...@apache.org>
>> To: hbase-user@hadoop.apache.org
>> Sent: Thu, 26 November, 2009 12:17:30 AM
>> Subject: Re: HBase High Availability
>>
>> First, there is work under way for 0.21 which will shorten the time 
>> necessary for region redeployment. Part of the delay in 0.20 is less than 
>> ideal performance in that regard by the master.
>>
>> Beyond that, just as a general operational principle, I recommend that you 
>> host no more than 200-250 regions per region server. The Bigtable paper 
>> talks about each tablet server hosting only 100 regions, with only 200 MB of 
>> data each. While that is not cost effective for folks who do not build their 
>> own hardware in bulk, it should cause you to think about why:
>>   - Limiting the number of regions per tablet server limits time to recovery 
>> upon node failure -- you can engineer this to be within some threshold
>>   - Limiting the amount of data per region means that servers with 
>> reasonable RAM can cache and serve a lot of the data out of memory for 
>> sub-disk data access latencies
>>
>> So the advice here is to opt for more servers, not less; more RAM, not less; 
>> and smaller disk, not larger.
>>
>> You should also consider the impact of server failure on HDFS -- loss of 
>> block replicas. For each under-replicated block, HDFS must work to make 
>> additional copies. This can come at a bad time if loss of the blocks in the 
>> first place was due to overloading.
>> Smaller disks mean fewer lost block replicas. For example, attach 4 x 160 GB 
>> drives as JBOD (as opposed to 4 x 1 TB or similar). Losing one disk means a 
>> loss of 160 GB worth of block replicas only (as opposed to 1 TB). Loss of a 
>> whole server means losing only 640 GB worth of block replicas (as opposed to 
>> 4 TB).
>> You can also consider attaching 6 or 8 or even more modest sized disks per 
>> server to increase the I/O parallelism (number of spindles) while also 
>> constraining the amount of block replica loss per disk failure.
>>
>> Even so, blocked reads and writes over some interval during region 
>> redeployment due to server failure or load rebalancing is part of the 
>> Bigtable architecture and so HBase, unless we take additional steps such as 
>> setting up active-passive region server pairs, but that would have 
>> complications which affect consistency and performance and might not provide 
>> enough benefit anyway (still there is time needed to detect failure and fall 
>> over). This is not an unavailability of the Bigtable service. Other regions 
>> are not affected. This is graceful/proportional service degradation in the 
>> face of partial failures. There are other alternatives to Bigtable which 
>> degrade differently given partial failures. Such options can give you no 
>> waiting on the write path at any time and possibly no waiting on the read 
>> path but you will lose strong consistency as the trade off. So you may get 
>> stale answers over some (unbounded, iirc) period, but this is the choice you 
>> make.
>>
>> HBase also has options like Stargate or the Thrift connector which can block 
>> and retry on behalf of your clients so they are never blocked for writes. 
>> For read path options I could look at having Stargate serve (possibly stale) 
>> answers out of a cache -- with some flag that indicates noncanonical state 
>> -- if that would be useful, and/or return immediate "try again" indication, 
>> so your clients are at least not stalled.
>>
>> Best regards,
>>
>>  - Andy
>>
>>
>>
>>
>> ________________________________
>> From: Murali Krishna. P <muralikpb...@yahoo.com>
>> To: hbase-user@hadoop.apache.org
>> Sent: Wed, November 25, 2009 1:31:45 AM
>> Subject: HBase High Availability
>>
>> Hi,
>>    This is regarding the region unavailability when a region server goes 
>> down. There will be cases where we have thousands of regions per RS and it 
>> takes considerable amount of time to redistribute the regions when a node 
>> fails. The service will be unavailable during that period. I am evaluating 
>> HBase for an application where we need to guarantee close to 100% 
>> availability (namenode is still SPOF, leave that).
>>
>>    One simple idea would be to replicate the regions in memory. Can we load 
>> the same region in multiple region servers? I am not sure about the 
>> feasibility yet, there will be issues like consistency across these in 
>> memory replicas. Wanted to know whether there were any thoughts / work 
>> already going on this area? I saw some related discussion here 
>> http://osdir.com/ml/hbase-user-hadoop-apache/2009-09/msg00118.html, not sure 
>> what is the state.
>>
>>  Same needs to be done with the master as well or is it already done with 
>> ZK? How fast is the master re-election and catalog load currently ? Do we 
>> always have multiple masters in ready to run state?
>>
>>
>> Thanks,
>> Murali Krishna
>



-- 
Imran M Yousuf
Entrepreneur & Software Engineer
Smart IT Engineering
Dhaka, Bangladesh
Email: im...@smartitengineering.com
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

Re: HBase High Availability

Reply via email to