On Thu, Nov 26, 2009 at 12:05 PM, Jean-Daniel Cryans <jdcry...@apache.org> wrote: <snip /> > > Be also aware that we are planning to include a master-slave > replication between datacenters in 0.21. >
>From this discussion and a presentation of Ryan Rawson and Jonathan Gray I am really looking forward to release 0.21, any idea on the timeline? - Imran > J-D > > On Wed, Nov 25, 2009 at 8:45 PM, Murali Krishna. P > <muralikpb...@yahoo.com> wrote: >> Thanks JD for the detailed reply. >> >> Does the underlying java api currently block in case if region is not >> available ? I would like to get an immediate retry indication for the java >> call in such cases so that I can redirect the request to the duplicate table >> in the other data center. Can this be supported? >> >> Thanks, >> Murali Krishna >> >> >> >> >> ________________________________ >> From: Andrew Purtell <apurt...@apache.org> >> To: hbase-user@hadoop.apache.org >> Sent: Thu, 26 November, 2009 12:17:30 AM >> Subject: Re: HBase High Availability >> >> First, there is work under way for 0.21 which will shorten the time >> necessary for region redeployment. Part of the delay in 0.20 is less than >> ideal performance in that regard by the master. >> >> Beyond that, just as a general operational principle, I recommend that you >> host no more than 200-250 regions per region server. The Bigtable paper >> talks about each tablet server hosting only 100 regions, with only 200 MB of >> data each. While that is not cost effective for folks who do not build their >> own hardware in bulk, it should cause you to think about why: >> - Limiting the number of regions per tablet server limits time to recovery >> upon node failure -- you can engineer this to be within some threshold >> - Limiting the amount of data per region means that servers with >> reasonable RAM can cache and serve a lot of the data out of memory for >> sub-disk data access latencies >> >> So the advice here is to opt for more servers, not less; more RAM, not less; >> and smaller disk, not larger. >> >> You should also consider the impact of server failure on HDFS -- loss of >> block replicas. For each under-replicated block, HDFS must work to make >> additional copies. This can come at a bad time if loss of the blocks in the >> first place was due to overloading. >> Smaller disks mean fewer lost block replicas. For example, attach 4 x 160 GB >> drives as JBOD (as opposed to 4 x 1 TB or similar). Losing one disk means a >> loss of 160 GB worth of block replicas only (as opposed to 1 TB). Loss of a >> whole server means losing only 640 GB worth of block replicas (as opposed to >> 4 TB). >> You can also consider attaching 6 or 8 or even more modest sized disks per >> server to increase the I/O parallelism (number of spindles) while also >> constraining the amount of block replica loss per disk failure. >> >> Even so, blocked reads and writes over some interval during region >> redeployment due to server failure or load rebalancing is part of the >> Bigtable architecture and so HBase, unless we take additional steps such as >> setting up active-passive region server pairs, but that would have >> complications which affect consistency and performance and might not provide >> enough benefit anyway (still there is time needed to detect failure and fall >> over). This is not an unavailability of the Bigtable service. Other regions >> are not affected. This is graceful/proportional service degradation in the >> face of partial failures. There are other alternatives to Bigtable which >> degrade differently given partial failures. Such options can give you no >> waiting on the write path at any time and possibly no waiting on the read >> path but you will lose strong consistency as the trade off. So you may get >> stale answers over some (unbounded, iirc) period, but this is the choice you >> make. >> >> HBase also has options like Stargate or the Thrift connector which can block >> and retry on behalf of your clients so they are never blocked for writes. >> For read path options I could look at having Stargate serve (possibly stale) >> answers out of a cache -- with some flag that indicates noncanonical state >> -- if that would be useful, and/or return immediate "try again" indication, >> so your clients are at least not stalled. >> >> Best regards, >> >> - Andy >> >> >> >> >> ________________________________ >> From: Murali Krishna. P <muralikpb...@yahoo.com> >> To: hbase-user@hadoop.apache.org >> Sent: Wed, November 25, 2009 1:31:45 AM >> Subject: HBase High Availability >> >> Hi, >> This is regarding the region unavailability when a region server goes >> down. There will be cases where we have thousands of regions per RS and it >> takes considerable amount of time to redistribute the regions when a node >> fails. The service will be unavailable during that period. I am evaluating >> HBase for an application where we need to guarantee close to 100% >> availability (namenode is still SPOF, leave that). >> >> One simple idea would be to replicate the regions in memory. Can we load >> the same region in multiple region servers? I am not sure about the >> feasibility yet, there will be issues like consistency across these in >> memory replicas. Wanted to know whether there were any thoughts / work >> already going on this area? I saw some related discussion here >> http://osdir.com/ml/hbase-user-hadoop-apache/2009-09/msg00118.html, not sure >> what is the state. >> >> Same needs to be done with the master as well or is it already done with >> ZK? How fast is the master re-election and catalog load currently ? Do we >> always have multiple masters in ready to run state? >> >> >> Thanks, >> Murali Krishna > -- Imran M Yousuf Entrepreneur & Software Engineer Smart IT Engineering Dhaka, Bangladesh Email: im...@smartitengineering.com Blog: http://imyousuf-tech.blogs.smartitengineering.com/ Mobile: +880-1711402557