[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176778#comment-13176778 ]
Jimmy Xiang commented on HBASE-5099: ------------------------------------ Cool, let me submit a patch. > ZK event thread waiting for root region while server shutdown handler waiting > for event thread to finish distributed log splitting to recover the region > sever the root region is on > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: HBASE-5099 > URL: https://issues.apache.org/jira/browse/HBASE-5099 > Project: HBase > Issue Type: Bug > Affects Versions: 0.92.0, 0.94.0 > Reporter: Jimmy Xiang > Assignee: Jimmy Xiang > Attachments: ZK-event-thread-waiting-for-root.png, > distributed-log-splitting-hangs.png, hbase-5099.patch > > > A RS died. The ServerShutdownHandler kicked in and started the logspliting. > SpliLogManager > installed the tasks asynchronously, then started to wait for them to complete. > The task znodes were not created actually. The requests were just queued. > At this time, the zookeeper connection expired. HMaster tried to recover the > expired ZK session. > During the recovery, a new zookeeper connection was created. However, this > master became the > new master again. It tried to assign root and meta. > Because the dead RS got the old root region, the master needs to wait for the > log splitting to complete. > This waiting holds the zookeeper event thread. So the async create split > task is never retried since > there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira