[
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444659#comment-13444659
]
Tsz Wo (Nicholas), SZE commented on HDFS-3540:
----------------------------------------------
If I have not missed anything, there are two risks in the branch-1 Recovery
Mode feature:
# If there is a stray OP_INVALID byte, it could be misinterpreted as an
end-of-log and lead to silent data loss.
# Recovery Mode does not consider the corruption length. If an edit log is
corrupted in the beginning and the admin mistakenly selects "stop reading" in
Recovery Mode, then a large portion of the edit log is ignored. It could cause
unnecessary data loss even if the edit log has been backed up since datanodes
will delete data. In many cases, such data loss could be prevented or reduced
because the edit log could possibly be recovered by other means. This case
arguably is an operation mistake. However, Recovery Mode enables such mistake.
The Edit Log Toleration feature does not have these two risks if the toleration
length is set to 0 (or a small number). Edit Log Toleration always checks all
bytes in the edit log, so #1 won't happen. For #2, the length of corrupted
data being tolerated is limited by the toleration length. If an edit log is
corrupted in the beginning and the corrupted length is large, then it will
throw an exception.
Therefore, I suggest to remove Recovery Mode from branch-1 and change the
default toleration length to 0.
> Further improvement on recovery mode and edit log toleration in branch-1
> ------------------------------------------------------------------------
>
> Key: HDFS-3540
> URL: https://issues.apache.org/jira/browse/HDFS-3540
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 1.2.0
> Reporter: Tsz Wo (Nicholas), SZE
> Assignee: Tsz Wo (Nicholas), SZE
>
> *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the
> recovery mode feature in branch-1 is dramatically different from the recovery
> mode in trunk since the edit log implementations in these two branch are
> different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not
> in trunk.
> *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy
> UNCHECKED_REGION_LENGTH and to tolerate edit log corruption.
> There are overlaps between these two features. We study potential further
> improvement in this issue.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira