[
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944334#comment-15944334
]
Enis Soztutar commented on HBASE-17287:
---------------------------------------
Thanks Ted for the test. Why is it inside TestCreateTableProcedure, does not
belong there I think. Consider TestMasterFailover, or TestMasterFileSystem or
something.
Why are we aborting the regionserver? Is that the one running inside the
master? If master abort does not cause the regionserver abort, then it means
that the issue is not fixed.
> Master becomes a zombie if filesystem object closes
> ---------------------------------------------------
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
> Issue Type: Bug
> Components: master
> Reporter: Clay B.
> Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt,
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt,
> 17287.master.v4.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's
> HDFS client is unable to stabilize before
> {{dfs.client.failover.max.attempts}} then the master's filesystem object
> closes. This seems to result in an HBase master which will continue to run
> (process and znode exists) but no meaningful work can be done (e.g. assigning
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler:
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
> java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException:
> Filesystem closed{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)