[
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943637#comment-15943637
]
Ted Yu commented on HBASE-17287:
--------------------------------
Performed the above procedure on 1.1 cluster patched with 17287.branch-1.v3.txt
Once the meta server was killed, I observed the following in master log:
{code}
2017-03-27 16:52:01,080 FATAL [MASTER_SERVER_OPERATIONS-cn013:16000-1]
master.HMaster: Master server abort: loaded coprocessors are:
[org.apache.hadoop.hbase.backup. master.BackupController]
2017-03-27 16:52:01,080 FATAL [MASTER_SERVER_OPERATIONS-cn013:16000-1]
master.HMaster: Shutting down HBase cluster: file system not available
java.io.IOException: File system is in safemode, it can't be written now
at
org.apache.hadoop.hbase.util.FSUtils.checkDfsSafeMode(FSUtils.java:561)
at
org.apache.hadoop.hbase.master.MasterFileSystem.checkFileSystem(MasterFileSystem.java:202)
at
org.apache.hadoop.hbase.master.MasterFileSystem.getLogDirs(MasterFileSystem.java:372)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:425)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:402)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:319)
at
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:213)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
> Master becomes a zombie if filesystem object closes
> ---------------------------------------------------
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
> Issue Type: Bug
> Components: master
> Reporter: Clay B.
> Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.v3.txt, 17287.master.v2.txt,
> 17287.master.v3.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's
> HDFS client is unable to stabilize before
> {{dfs.client.failover.max.attempts}} then the master's filesystem object
> closes. This seems to result in an HBase master which will continue to run
> (process and znode exists) but no meaningful work can be done (e.g. assigning
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler:
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
> java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException:
> Filesystem closed{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)