[
https://issues.apache.org/jira/browse/HBASE-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358591#comment-15358591
]
Phil Yang commented on HBASE-16144:
-----------------------------------
If the RS get "session expired", RecoverableZooKeeper will try to reconnect
instead of crash itself. If we use ephemeral node for lock, after reconnect
there is no lock so more than one RS will copy the queue. In other words, if
ephemeral node disappeared, we can not say the server must have died.
> Replication queue's lock will live forever if RS acquiring the lock has died
> prematurely
> ----------------------------------------------------------------------------------------
>
> Key: HBASE-16144
> URL: https://issues.apache.org/jira/browse/HBASE-16144
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.2.1, 1.1.5, 0.98.20
> Reporter: Phil Yang
> Assignee: Phil Yang
> Attachments: HBASE-16144-v1.patch, HBASE-16144-v2.patch
>
>
> In default, we will use multi operation when we claimQueues from ZK. But if
> we set hbase.zookeeper.useMulti=false, we will add a lock first, then copy
> nodes, finally clean old queue and the lock.
> However, if the RS acquiring the lock crash before claimQueues done, the lock
> will always be there and other RS can never claim the queue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)