[ 
https://issues.apache.org/jira/browse/HBASE-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915344#comment-16915344
 ] 

Sean Busbey commented on HBASE-22918:
-------------------------------------

I suspect this might be better suited to the mailing list dev@hbase. It's not 
clear to me if you're claiming you've found incorrect behavior or if you're 
asking about expected behavior.

Here's my attempt at describing the scenario I think you're setting up.

We have some master process, a region server process, and a zk that acts as 
liveliness check.

1) client is talking to RS process.
2) RS process is properly writing to HDFS
3) RS process can not talk to ZK during some window after client starts and 
long enough to reach the ZK timeout.

have I accurately described things?

In that scenario, what should happen is

1) ZK node for the RS will expire because it is not heart beating
2) Master will see that this has happened and will forcefully recover the HDFS 
lease on WALs for the RS
3) Master will then process recovery of those WALs and assign the regions 
elsewhere
4) If the RS heartbeats to the master after #2, master will send it a "you are 
dead" and RS should abort
5) If the RS attempts to write to the wal after #2, it will fail because it no 
longer has a lease, and the RS should abort
6) a client attempting to send writes to the RS after #2 will also fail, 
because the write to the wal will fail.
7) presuming retries are configured correctly, the client will eventually send 
writes to whomever the master picked in #3

are you observing something other than the above?



> RegionServer violates failfast fault assumption
> -----------------------------------------------
>
>                 Key: HBASE-22918
>                 URL: https://issues.apache.org/jira/browse/HBASE-22918
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ranpanfeng
>            Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> hbase 2.1.5 is tested and veriflied seriously before it will be deployed in 
> our production environment. we give NP(network partition) fault a very 
> important care. so NP fault injection tests are conducted in our test 
> environment. Some findings are exposed.
> I use ycsb to write data  into table SYSTEM:test, which resides on 
> regionserver0; during the writting, I use iptables to drop any packet from 
> regionserver0 to zookeeper quorums. after
> a default zookeeper.session.out(90'), regionserver0 throws 
> YouAreDeadException after retries  to connect to zookeeper on 
> TimeoutException error. then, regionserver0 suicides itself, before 
> regionserver0 invokes completeFile  on WAL, the active master already 
> considered regionserver0 has dead pre-maturely, so invokes recoverLease to 
> close the WAL on regionserver0 forcely.
> In trusted idc, distributed storage assumes that the error are always 
> failstop/failfast faults, there are no Byzantine failures. so in above 
> scenario, active master should take over the WAL on regionserver0 after 
> regionserver0 is suicided successfully.  According to lease protocol, RS
> should suicide in a lease period, and active master should take over the WAL
>  after a grace period has elapsed, and invariant "lease period < grace 
> period" should always hold.  in hbase-site.xml, only on config property 
> "zookeeper.session.timeout" is given,  I think we should provide two 
> properties:
>   1. regionserver.zookeeper.session.timeout
>   2. master.zookeeper.session.timeout
> HBase admin then can tune regionserver.zookeeper.session.timeout less than 
> master.zookeeper.session.timeout. In this way, failstop assumption is 
> guaranteed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to