[jira] [Commented] (HBASE-10070) HBase read high-availability using eventually consistent region replicas

Enis Soztutar (JIRA) Mon, 16 Dec 2013 17:33:25 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849978#comment-13849978
 ]


Enis Soztutar commented on HBASE-10070:
---------------------------------------

Update: as discussed, we have put a proof-of-concept implementation for a 
working end-to-end scenario, and would like to share that to get some early 
reviews and feedback. If you are interested on the technical side of the 
changes, please check the patch/branch out. Please note that the patches and 
the branch is far from being clean and complete, but otherwise clean enough to 
understand the scope of changes and areas that are touched. This also contains 
the end-to-end API's at the client side (except for execution policies). We 
will continue to work on the patches to get them in a more mature state, and 
recreate and clean up the patches for reviews, but at any stage, reviews / 
comments are welcome. We will keep pushing the changes to this repo / branch 
until the patches are in a more stable state, at which point, we will work on 
cleaning up and shuffling the patches to be more consumable by reviews. 

The code is at github repo: https://github.com/enis/hbase.git, and the branch 
is hbase-10070-demo. This repository is based on 0.96.0 for now. I'll also 
attach a patch which contains all the changes if you want to take a closer 
look.  
This can be build with:
{code}
git clone [email protected]:enis/hbase.git
cd hbase 
git checkout hbase-10070-demo 
MAVEN_OPTS="-Xmx2g" mvn clean install package assembly:single 
-Dhadoop.profile=2.0  -DskipTests  -Dmaven.javadoc.skip=true 
-Dhadoop-two.version=2.2.0
{code}
The tar ball generated would be hbase-assembly/target/hbase-0.96.0-bin.tar.gz
The hadoop version that should be used for real cluster testing is 2.2.0. 

What's there in the repository:
1. Client (Shell) changes
The shell has been modified so that tables with more than one replica per 
region can be created:
  create 't1', 'f1', {REGION_REPLICATION => 2}
One can also 'describe' a table and that will have the replica configuration in 
the response string:
  describe 't1'
One can do a 'get' with the eventual consistency flag set (we haven't 
implemented the consistency semantics for the 'scan' family in this drop):
  get 't2','r6',{"EVENTUAL_CONSISTENCY" => true}
[NOTE THE quotes around the EVENTUAL_CONSISTENCY string. Will fix this soon to 
work without the quotes.]

Outside the shell, the API to do with setting the willingness to tolerate 
eventual consistency is Get.setConsistency and the returned Result can be 
queried if it is stale or not via Result.isStale

2. Master changes
The one main change here is about creation and management of replica 
HRegionInfo objects. The other change is to make the StochasticLoadBalancer 
aware of the replicas. During the assignment process, the Assignment Manager 
consults the balancer to give it a plan for the assignment - here the 
StochasticLoadBalancer ensures that the plan takes into account the constraint 
- primary/secondary not assigned to the same server, same rack (if more than 
one rack configured).

3. RegionServer changes
The one main change here is to be able to open regions in readonly mode. The 
other change here is to do with the periodic refresh of store files. The 
configuration that sets this up is (this is disabled by default):
{code}
 <property>
   <name>hbase.regionserver.storefile.refresh.period</name>
   <value>2000</value>
 </property>
{code}
4. UI changes
The UIs corresponding to the tables' status and the regions' status have been 
modified to say whether they have replicas.

There are unit tests - 
TestMasterReplicaRegions,TestRegionReplicas,TestReplicasClient,TestBaseLoadBalancer
 and some others. 

There is also a manual test scenario to test out reads coming from the 
secondary replica:
1. create a (at least) two node cluster.  
2. create a table with replica 2. From HBase shell:
  create 't1', 'f1', {REGION_REPLICATION => 2}
3. arrange the regions so that the the primary region is not co-located with 
meta or namespace regions.
 You can use move commands in HBase shell for that: 
 move '4392870ae8ef482406c272eec0312a02', '192.168.0.106,60020,1387069812919'

4. from the shell do a couple of puts and then 'flush' the table from the shell
{code}
hbase(main):005:0> for i in 1..100
hbase(main):006:1> put 't1', "r#{i}", 'f1:c1', i
hbase(main):007:1> end
hbase(main):009:0> flush 't1'
{code}
5. suspend the region server which is hosting the primary region replica by 
sending kill -STOP signal from bash:
{code}
kill -STOP <pid_of_region_server> 
{code}
6. get a row from the table with eventual-consistency flag set to true.
{code}
 get 't2','r6',{"EVENTUAL_CONSISTENCY" => true}
{code}
7. put should fail
The (6) and (7) steps should be done quickly enough otherwise the master would 
recover the region!! (Default ZK session timeout is 90 seconds)

> HBase read high-availability using eventually consistent region replicas
> ------------------------------------------------------------------------
>
>                 Key: HBASE-10070
>                 URL: https://issues.apache.org/jira/browse/HBASE-10070
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: HighAvailabilityDesignforreadsApachedoc.pdf
>
>
> In the present HBase architecture, it is hard, probably impossible, to 
> satisfy constraints like 99th percentile of the reads will be served under 10 
> ms. One of the major factors that affects this is the MTTR for regions. There 
> are three phases in the MTTR process - detection, assignment, and recovery. 
> Of these, the detection is usually the longest and is presently in the order 
> of 20-30 seconds. During this time, the clients would not be able to read the 
> region data.
> However, some clients will be better served if regions will be available for 
> reads during recovery for doing eventually consistent reads. This will help 
> with satisfying low latency guarantees for some class of applications which 
> can work with stale reads.
> For improving read availability, we propose a replicated read-only region 
> serving design, also referred as secondary regions, or region shadows. 
> Extending current model of a region being opened for reads and writes in a 
> single region server, the region will be also opened for reading in region 
> servers. The region server which hosts the region for reads and writes (as in 
> current case) will be declared as PRIMARY, while 0 or more region servers 
> might be hosting the region as SECONDARY. There may be more than one 
> secondary (replica count > 2).
> Will attach a design doc shortly which contains most of the details and some 
> thoughts about development approaches. Reviews are more than welcome. 
> We also have a proof of concept patch, which includes the master and regions 
> server side of changes. Client side changes will be coming soon as well. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-10070) HBase read high-availability using eventually consistent region replicas

Reply via email to