[ 
https://issues.apache.org/jira/browse/HBASE-11922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-11922.
-----------------------------------------
    Resolution: Invalid

> copyQueuesFromRSUsingMulti may fail to clean up properly if zk.useMulti is 
> true and there are orphaned queues
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-11922
>                 URL: https://issues.apache.org/jira/browse/HBASE-11922
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.6
>            Reporter: Andrew Kyle Purtell
>            Priority: Minor
>
> To reproduce, set hbase.zookeeper.useMulti to true in site configuration, 
> start up an all-localhost cluster, create a table with a CF with a 
> replication scope of 1, add a peer (doesn't have to be a live endpoint), 
> remove the peer, restart the regionserver.  Observe:
> {noformat}
> 2014-09-09 13:39:23,497 WARN  [ReplicationExecutor-0] 
> replication.ReplicationQueuesZKImpl: Got exception in 
> copyQueuesFromRSUsingMulti: 
> org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = 
> Directory not empty
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
>       at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>       at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>       at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:620)
>       at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1530)
>       at 
> org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.copyQueuesFromRSUsingMulti(ReplicationQueuesZKImpl.java:335)
>       at 
> org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.claimQueues(ReplicationQueuesZKImpl.java:167)
>       at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:520)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is because there was an orphaned queue. 
> If ZK rolls back state after a failed multi (need to check, but let's assume 
> so for now), then other ops bundled into the multi-op by 
> copyQueuesFromRSUsingMulti will be rolled back, which might not be what we 
> want.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to