[
https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203157#comment-13203157
]
Bikas Saha commented on HDFS-2912:
----------------------------------
>From what I read of the code, for some of the cases (such as a flush of logs)
>where the NN actually dies on shared dir hiccups the runtime.exit() call was
>not added in the HA context. It was added when JournalSet was added by
>Jitendra long ago.
In any case, I would ideally like to have a cleaner shutdown mechanism to make
sure that exit(1) do not proliferate in hard to find ways. Will let
[HDFS-2913|https://issues.apache.org/jira/browse/HDFS-2913] track that.
For now, I will add an exit(1) after the LOG.FATAL in
JournalSet.mapJournalsAndReportErrors(). This is the common code path through
which all journal operations go through (roll edit logs, flush etc). So putting
one here should hopefully catch all journal related cases.
> HA: Namenode not shutting down when shared edits dir is inaccessible
> --------------------------------------------------------------------
>
> Key: HDFS-2912
> URL: https://issues.apache.org/jira/browse/HDFS-2912
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha, name-node
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Bikas Saha
> Assignee: Bikas Saha
>
> When there is an error in shared edits dir then current policy requires the
> active name node to abort and shutdown.
> Currently there is no way to shut down the name node and hence this does not
> happen even after all journals have been aborted on error. In fact the name
> node stays Active and also is not in safe mode. Ideally it should shut down,
> or at least go into safe mode or standby mode.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira