[ 
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444975#comment-13444975
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3540:
----------------------------------------------

{quote}
Recovery mode will always prompt before doing anything which could lead to data 
loss. So no, stray OP_INVALID bytes will not lead to silent data loss.

Actually, looking at change 1349086, which was introduced by HDFS-3521, I see 
that it broke end-of-file checking by default. Since 
dfs.namenode.edits.toleration.length is -1 by default, FSEditLog#checkEndOfLog 
is never invoked. However, this is not a problem with Recovery Mode; it's a 
problem with change 1349086.
{quote}
Before HDFS-3521, there is a UNCHECKED_REGION_LENGTH in Recovery Mode.  If a 
stray OP_INVALID byte is within the unchecked region, it will cause silent data 
loss.

{quote}
Recovery Mode does consider the corruption length. The location at which the 
problem occurred is printed out. This is the message "Failed to parse edit log 
(<file name>) at position <position>, edit log length is <length>..." This 
information is provided to allow the system administrator to make an informed 
decision.
{quote}
You still do not know the corruption length since there may be padding at the 
end.  System admins won't know the padding length and so they won't be able to 
know the corruption length.

                
> Further improvement on recovery mode and edit log toleration in branch-1
> ------------------------------------------------------------------------
>
>                 Key: HDFS-3540
>                 URL: https://issues.apache.org/jira/browse/HDFS-3540
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 1.2.0
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>
> *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1.  However, the 
> recovery mode feature in branch-1 is dramatically different from the recovery 
> mode in trunk since the edit log implementations in these two branch are 
> different.  For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not 
> in trunk.
> *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy 
> UNCHECKED_REGION_LENGTH and to tolerate edit log corruption.
> There are overlaps between these two features.  We study potential further 
> improvement in this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to