[
https://issues.apache.org/jira/browse/HDFS-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202846#comment-13202846
]
Bikas Saha commented on HDFS-2909:
----------------------------------
This is happening because JournalSet.mapJournalsAndReportErrors() calls
abortAllJournals() and throws new IOException when a required journal fails (in
this case, the shared dir). I still have to see why the NN continues to run as
active after this.
Coming back to the above, it seems that the abortAllJournals() code implies
that NN should stop running when something like this happens. That would mean
that inaccessibility of the the single shared edits dir will cause the active
NN to shutdown. Most likely the standby NN will also not be able to access the
shared edits dir. Which means that the shared edits dir has become a single
point of failure for the HA service.
Still looking at why NN did not abort.
> HA: Inaccessible shared edits dir not getting removed from FSImage storage
> dirs upon error
> ------------------------------------------------------------------------------------------
>
> Key: HDFS-2909
> URL: https://issues.apache.org/jira/browse/HDFS-2909
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha, name-node
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Bikas Saha
> Assignee: Bikas Saha
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira