[
https://issues.apache.org/jira/browse/HDFS-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron T. Myers updated HDFS-2010:
---------------------------------
Attachment: hdfs-2010.0.patch
Here's a patch which addresses the issue.
Unfortunately, I had to put a bunch of "try { // FS metadata op } catch
(LogSyncException e) { ... }" throughout the code because
{{FSEditLog.logEdit(...)}} potentially calls {{FSEditLog.logSync()}}, though
likely very rarely. I'd like to discuss the possibility of removing this call
to {{logSync()}} from {{logEdit(...)}}, which would simplify this code.
Note also that this patch fixes what I believe to be a bug in that
{{FSEditLog.endCurrentLogSegment()}} did not calll {{FSEditLog.logSync()}}.
Please let me know if I'm right about this being a bug, or if you'd like me to
break this out into a separate JIRA.
> Clean up and test behavior under failed edit streams
> ----------------------------------------------------
>
> Key: HDFS-2010
> URL: https://issues.apache.org/jira/browse/HDFS-2010
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: name-node
> Affects Versions: Edit log branch (HDFS-1073)
> Reporter: Todd Lipcon
> Assignee: Aaron T. Myers
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2010.0.patch
>
>
> Right now there is very little test coverage of situations where one or more
> of the edits directories fails. In trunk, the behavior when all of the edits
> directories are dead is that the NN prints a fatal level log message and
> calls Runtime.exit(-1).
> I don't think this is really the behavior we want. Needs a bit of thought,
> but I think something like the following would make more sense:
> - any calls currently waiting on logSync should end up throwing an exception
> - NN should probably enter safe mode
> - ops can restore edits directories and then ask the NN to restore storage,
> at which point it could edit safemode
> - alternatively, ops could call ask the NN to do saveNamespace and then shut
> it down
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira