[
https://issues.apache.org/jira/browse/HBASE-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915396#comment-16915396
]
ranpanfeng commented on HBASE-22918:
------------------------------------
What you say is just what I observed, Yes, recoveryLease invocation is a
fence point, only one DFSClient can hold the lease, so there is only a single
writer append&flush HLOG. so there is no problem with write operations,
however, linearization consistency can not be guaranteed on a single row. the
event history as follows.
t0: rs#0 owns region#0, a hbase client A hold a long-term connecton to rs#0.
t1: NP fault between rs#0 and zk has happaned.
t2: emphemeral node of rs#0 is removed after zk fails to receive HB from rs#0
and zk session timeout.
t3: wacher on active master notified and then master recover region#0 to rs#1.
t4: someone mutate row#0 of region#0 which resides on rs#1.
t5: hbase client A read the stale version of row#0 from rs#0 via the long-term
connection.
t6: rs#0 timeouts and encounters YouAreDeadException, and then suicides.
t7: rs#0 shutdownss.
in the t5 time point, is there a stale reading happen?
> RegionServer violates failfast fault assumption
> -----------------------------------------------
>
> Key: HBASE-22918
> URL: https://issues.apache.org/jira/browse/HBASE-22918
> Project: HBase
> Issue Type: Bug
> Reporter: ranpanfeng
> Priority: Major
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> hbase 2.1.5 is tested and veriflied seriously before it will be deployed in
> our production environment. we give NP(network partition) fault a very
> important care. so NP fault injection tests are conducted in our test
> environment. Some findings are exposed.
> I use ycsb to write data into table SYSTEM:test, which resides on
> regionserver0; during the writting, I use iptables to drop any packet from
> regionserver0 to zookeeper quorums. after
> a default zookeeper.session.out(90'), regionserver0 throws
> YouAreDeadException after retries to connect to zookeeper on
> TimeoutException error. then, regionserver0 suicides itself, before
> regionserver0 invokes completeFile on WAL, the active master already
> considered regionserver0 has dead pre-maturely, so invokes recoverLease to
> close the WAL on regionserver0 forcely.
> In trusted idc, distributed storage assumes that the error are always
> failstop/failfast faults, there are no Byzantine failures. so in above
> scenario, active master should take over the WAL on regionserver0 after
> regionserver0 is suicided successfully. According to lease protocol, RS
> should suicide in a lease period, and active master should take over the WAL
> after a grace period has elapsed, and invariant "lease period < grace
> period" should always hold. in hbase-site.xml, only on config property
> "zookeeper.session.timeout" is given, I think we should provide two
> properties:
> 1. regionserver.zookeeper.session.timeout
> 2. master.zookeeper.session.timeout
> HBase admin then can tune regionserver.zookeeper.session.timeout less than
> master.zookeeper.session.timeout. In this way, failstop assumption is
> guaranteed.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)