[ 
https://issues.apache.org/jira/browse/HBASE-12336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189646#comment-14189646
 ] 

stack commented on HBASE-12336:
-------------------------------

bq. But i am wondering why this thread did not exist even if we called shutdown 
on shutdown.

Maybe we need to look at server.isStopped inside ReplicationSourceManager more 
often than we do?

Lets apply this patch.  These threads should be daemon anyways.  Can raise new 
issue if comes up again.



> RegionServer failed to shutdown for NodeFailoverWorker thread
> -------------------------------------------------------------
>
>                 Key: HBASE-12336
>                 URL: https://issues.apache.org/jira/browse/HBASE-12336
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.11
>            Reporter: Liu Shaohui
>            Assignee: Liu Shaohui
>            Priority: Minor
>             Fix For: 2.0.0
>
>         Attachments: HBASE-12336-trunk-v1.diff, stack
>
>
> After enabling hbase.zookeeper.useMulti in hbase cluster, we found that 
> regionserver failed to shutdown. Other threads have exited except a 
> NodeFailoverWorker thread.
> {code}
> "ReplicationExecutor-0" prio=10 tid=0x00007f0d40195ad0 nid=0x73a in 
> Object.wait() [0x00007f0dc8fe6000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:485)
>         at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>         - locked <0x00000005a16df080> (a 
> org.apache.zookeeper.ClientCnxn$Packet)
>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:930)
>         at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:912)
>         at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:531)
>         at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1518)
>         at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.copyQueuesFromRSUsingMulti(ReplicationZookeeper.java:804)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:612)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> It's sure that the shutdown method of the executor is called in  
> ReplicationSourceManager#join.
>  
> I am looking for the root cause and suggestions are welcomed. Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to