Re: HBase High Availability

Imran M Yousuf Thu, 26 Nov 2009 03:41:11 -0800

On Thu, Nov 26, 2009 at 6:19 PM, Ryan Rawson <ryano...@gmail.com> wrote:
> Probably around the same time as hadoop 0.21, in other words a few
> more months.  There may be chances to run RCs before then though.
>


Thanks for the quick reply Ryan. I am eagerly looking forward to
trying the RCs; as I am looking forward to a deployment around April
next year the timing looks just cool!

Thanks,

- Imran

> -ryan
>
> On Thu, Nov 26, 2009 at 3:15 AM, Imran M Yousuf <imyou...@gmail.com> wrote:
>> On Thu, Nov 26, 2009 at 12:05 PM, Jean-Daniel Cryans
>> <jdcry...@apache.org> wrote:
>> <snip />
>>>
>>> Be also aware that we are planning to include a master-slave
>>> replication between datacenters in 0.21.
>>>
>>
>> From this discussion and a presentation of Ryan Rawson and Jonathan
>> Gray I am really looking forward to release 0.21, any idea on the
>> timeline?
>>
>> - Imran
>>
>>> J-D
>>>
>>> On Wed, Nov 25, 2009 at 8:45 PM, Murali Krishna. P
>>> <muralikpb...@yahoo.com> wrote:
>>>> Thanks JD for the detailed reply.
>>>>
>>>> Does the underlying java api currently block in case if region is not 
>>>> available ? I would like to get an immediate retry indication for the java 
>>>> call in such cases so that I can redirect the request to the duplicate 
>>>> table in the other data center. Can this be supported?
>>>>
>>>>  Thanks,
>>>> Murali Krishna
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: Andrew Purtell <apurt...@apache.org>
>>>> To: hbase-user@hadoop.apache.org
>>>> Sent: Thu, 26 November, 2009 12:17:30 AM
>>>> Subject: Re: HBase High Availability
>>>>
>>>> First, there is work under way for 0.21 which will shorten the time 
>>>> necessary for region redeployment. Part of the delay in 0.20 is less than 
>>>> ideal performance in that regard by the master.
>>>>
>>>> Beyond that, just as a general operational principle, I recommend that you 
>>>> host no more than 200-250 regions per region server. The Bigtable paper 
>>>> talks about each tablet server hosting only 100 regions, with only 200 MB 
>>>> of data each. While that is not cost effective for folks who do not build 
>>>> their own hardware in bulk, it should cause you to think about why:
>>>>   - Limiting the number of regions per tablet server limits time to 
>>>> recovery upon node failure -- you can engineer this to be within some 
>>>> threshold
>>>>   - Limiting the amount of data per region means that servers with 
>>>> reasonable RAM can cache and serve a lot of the data out of memory for 
>>>> sub-disk data access latencies
>>>>
>>>> So the advice here is to opt for more servers, not less; more RAM, not 
>>>> less; and smaller disk, not larger.
>>>>
>>>> You should also consider the impact of server failure on HDFS -- loss of 
>>>> block replicas. For each under-replicated block, HDFS must work to make 
>>>> additional copies. This can come at a bad time if loss of the blocks in 
>>>> the first place was due to overloading.
>>>> Smaller disks mean fewer lost block replicas. For example, attach 4 x 160 
>>>> GB drives as JBOD (as opposed to 4 x 1 TB or similar). Losing one disk 
>>>> means a loss of 160 GB worth of block replicas only (as opposed to 1 TB). 
>>>> Loss of a whole server means losing only 640 GB worth of block replicas 
>>>> (as opposed to 4 TB).
>>>> You can also consider attaching 6 or 8 or even more modest sized disks per 
>>>> server to increase the I/O parallelism (number of spindles) while also 
>>>> constraining the amount of block replica loss per disk failure.
>>>>
>>>> Even so, blocked reads and writes over some interval during region 
>>>> redeployment due to server failure or load rebalancing is part of the 
>>>> Bigtable architecture and so HBase, unless we take additional steps such 
>>>> as setting up active-passive region server pairs, but that would have 
>>>> complications which affect consistency and performance and might not 
>>>> provide enough benefit anyway (still there is time needed to detect 
>>>> failure and fall over). This is not an unavailability of the Bigtable 
>>>> service. Other regions are not affected. This is graceful/proportional 
>>>> service degradation in the face of partial failures. There are other 
>>>> alternatives to Bigtable which degrade differently given partial failures. 
>>>> Such options can give you no waiting on the write path at any time and 
>>>> possibly no waiting on the read path but you will lose strong consistency 
>>>> as the trade off. So you may get stale answers over some (unbounded, iirc) 
>>>> period, but this is the choice you make.
>>>>
>>>> HBase also has options like Stargate or the Thrift connector which can 
>>>> block and retry on behalf of your clients so they are never blocked for 
>>>> writes. For read path options I could look at having Stargate serve 
>>>> (possibly stale) answers out of a cache -- with some flag that indicates 
>>>> noncanonical state -- if that would be useful, and/or return immediate 
>>>> "try again" indication, so your clients are at least not stalled.
>>>>
>>>> Best regards,
>>>>
>>>>  - Andy
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: Murali Krishna. P <muralikpb...@yahoo.com>
>>>> To: hbase-user@hadoop.apache.org
>>>> Sent: Wed, November 25, 2009 1:31:45 AM
>>>> Subject: HBase High Availability
>>>>
>>>> Hi,
>>>>    This is regarding the region unavailability when a region server goes 
>>>> down. There will be cases where we have thousands of regions per RS and it 
>>>> takes considerable amount of time to redistribute the regions when a node 
>>>> fails. The service will be unavailable during that period. I am evaluating 
>>>> HBase for an application where we need to guarantee close to 100% 
>>>> availability (namenode is still SPOF, leave that).
>>>>
>>>>    One simple idea would be to replicate the regions in memory. Can we 
>>>> load the same region in multiple region servers? I am not sure about the 
>>>> feasibility yet, there will be issues like consistency across these in 
>>>> memory replicas. Wanted to know whether there were any thoughts / work 
>>>> already going on this area? I saw some related discussion here 
>>>> http://osdir.com/ml/hbase-user-hadoop-apache/2009-09/msg00118.html, not 
>>>> sure what is the state.
>>>>
>>>>  Same needs to be done with the master as well or is it already done with 
>>>> ZK? How fast is the master re-election and catalog load currently ? Do we 
>>>> always have multiple masters in ready to run state?
>>>>
>>>>
>>>> Thanks,
>>>> Murali Krishna
>>>
>>
>>
>>
>> --
>> Imran M Yousuf
>> Entrepreneur & Software Engineer
>> Smart IT Engineering
>> Dhaka, Bangladesh
>> Email: im...@smartitengineering.com
>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>> Mobile: +880-1711402557
>>
>



-- 
Imran M Yousuf
Entrepreneur & Software Engineer
Smart IT Engineering
Dhaka, Bangladesh
Email: im...@smartitengineering.com
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

Re: HBase High Availability

Reply via email to