[ 
https://issues.apache.org/jira/browse/HBASE-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915396#comment-16915396
 ] 

ranpanfeng commented on HBASE-22918:
------------------------------------

What you say is just what I observed,   Yes, recoveryLease  invocation is a 
fence point, only one DFSClient can hold the lease, so there is only a single 
writer append&flush HLOG. so there is no problem with write operations, 
however, linearization  consistency can not be guaranteed on a single row. the 
event history as follows.

t0: rs#0 owns region#0, a hbase client A hold a long-term connecton to rs#0.

t1: NP fault between rs#0 and zk has happaned. 

t2: emphemeral node of rs#0 is removed after zk fails to receive HB from rs#0 
and zk session timeout.

t3: wacher on active master notified and then master recover region#0 to rs#1.

t4: someone mutate row#0 of region#0 which resides on rs#1.

t5: hbase client A read the stale version of row#0 from rs#0 via the long-term 
connection.

t6: rs#0 timeouts and encounters YouAreDeadException, and then suicides.

t7: rs#0 shutdownss.

 

in the t5 time point, is there a stale reading happen?

> RegionServer violates failfast fault assumption
> -----------------------------------------------
>
>                 Key: HBASE-22918
>                 URL: https://issues.apache.org/jira/browse/HBASE-22918
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ranpanfeng
>            Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> hbase 2.1.5 is tested and veriflied seriously before it will be deployed in 
> our production environment. we give NP(network partition) fault a very 
> important care. so NP fault injection tests are conducted in our test 
> environment. Some findings are exposed.
> I use ycsb to write data  into table SYSTEM:test, which resides on 
> regionserver0; during the writting, I use iptables to drop any packet from 
> regionserver0 to zookeeper quorums. after
> a default zookeeper.session.out(90'), regionserver0 throws 
> YouAreDeadException after retries  to connect to zookeeper on 
> TimeoutException error. then, regionserver0 suicides itself, before 
> regionserver0 invokes completeFile  on WAL, the active master already 
> considered regionserver0 has dead pre-maturely, so invokes recoverLease to 
> close the WAL on regionserver0 forcely.
> In trusted idc, distributed storage assumes that the error are always 
> failstop/failfast faults, there are no Byzantine failures. so in above 
> scenario, active master should take over the WAL on regionserver0 after 
> regionserver0 is suicided successfully.  According to lease protocol, RS
> should suicide in a lease period, and active master should take over the WAL
>  after a grace period has elapsed, and invariant "lease period < grace 
> period" should always hold.  in hbase-site.xml, only on config property 
> "zookeeper.session.timeout" is given,  I think we should provide two 
> properties:
>   1. regionserver.zookeeper.session.timeout
>   2. master.zookeeper.session.timeout
> HBase admin then can tune regionserver.zookeeper.session.timeout less than 
> master.zookeeper.session.timeout. In this way, failstop assumption is 
> guaranteed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to