[
https://issues.apache.org/jira/browse/HDFS-13314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405890#comment-16405890
]
Xiao Chen commented on HDFS-13314:
----------------------------------
Thanks [~arpitagarwal] and all for the effort here. Also ping [~yzhangal] for
interest.
I echo the difficulty and sometimes frustration on not able to reproduce a
corruption. The idea here sounds good.
I'm inclined to agree with Arpit that we should not change the default
behavior, though. In the extreme case where someone really wants the checkpoint
done (e.g. has not checkpointed for a long time so lots of edits, etc.),
keeping the old behavior seems better - you cannot let them reconfigure and do
it again. I think it may also be possible if the workflow deletes a bunch of
stuff (e.g. the problematic snapshot, parent dir, etc.), and checkpoint, the
corruption may not happen at all - although this is an untested guess.
> NameNode should optionally exit if it detects FsImage corruption
> ----------------------------------------------------------------
>
> Key: HDFS-13314
> URL: https://issues.apache.org/jira/browse/HDFS-13314
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: Arpit Agarwal
> Assignee: Arpit Agarwal
> Priority: Major
> Attachments: HDFS-13314.01.patch, HDFS-13314.02.patch
>
>
> The NameNode should optionally exit after writing an FsImage if it detects
> the following kinds of corruptions:
> # INodeReference pointing to non-existent INode
> # Duplicate entries in snapshot deleted diff list.
> This behavior is controlled via an undocumented configuration setting, and
> disabled by default.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]