[ 
https://issues.apache.org/jira/browse/HDFS-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627152#comment-13627152
 ] 

Todd Lipcon commented on HDFS-4300:
-----------------------------------

Hey Andrew. One question in a fault scenario: let's say it's trying to download 
some edits, and one of the dirs fails. This would leave the tmp files in place. 
Is it possible that then, in a future attempt at checkpointing, we might 
accidentally rename that tmp file into the final location?

One potential fix for that would be to make the tmp file names use the current 
timestamp as a suffix, so that they aren't reused in later attempts.
                
> TransferFsImage.downloadEditsToStorage should use a tmp file for destination
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-4300
>                 URL: https://issues.apache.org/jira/browse/HDFS-4300
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Andrew Wang
>            Priority: Critical
>         Attachments: hdfs-4300-1.patch
>
>
> Currently, in TransferFsImage.downloadEditsToStorage, we download the edits 
> file directly to its finalized path. So, if the transfer fails in the middle, 
> a half-written file is left and cannot be distinguished from a correct file. 
> So, future checkpoints by the 2NN will fail, since the file is truncated in 
> the middle -- but it won't ever download a good copy because it thinks it 
> already has the proper file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to