[ https://issues.apache.org/jira/browse/HADOOP-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668127#action_12668127 ]
Hairong Kuang commented on HADOOP-4663: --------------------------------------- Is it possible that DataNode leaves the blocks under tmp untouched at the startup time? Instead it leaves those blocks for the lease recovery process to prompt them. When a DataNode starts up, it reads blocks under tmp and put them to OngoingCreates data structure. It then reports them to NN. If NN sees a tmp block that is not the last block of an under-construction file, mark it as corrupt; Otherwise, this is really an under construction block and NN adds it to the targets set of the file. Later when the file's lease expires, NN will close the file and those blocks will be finalized. The idea is to start DataNode from the same state when it was down. Prompting blocks at the startup time provides a possibility of polluting dfs data. > Datanode should delete files under tmp when upgraded from 0.17 > -------------------------------------------------------------- > > Key: HADOOP-4663 > URL: https://issues.apache.org/jira/browse/HADOOP-4663 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.18.0 > Reporter: Raghu Angadi > Assignee: dhruba borthakur > Priority: Blocker > Fix For: 0.19.1 > > Attachments: deleteTmp.patch, deleteTmp2.patch, deleteTmp_0.18.patch, > handleTmp1.patch > > > Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp > directory since these files are not valid anymore. But in 0.18 it moves these > files to normal directory incorrectly making them valid blocks. One of the > following would work : > - remove the tmp files during upgrade, or > - if the files under /tmp are in pre-18 format (i.e. no generation), delete > them. > Currently effect of this bug is that, these files end up failing block > verification and eventually get deleted. But cause incorrect over-replication > at the namenode before that. > Also it looks like our policy regd treating files under tmp needs to be > defined better. Right now there are probably one or two more bugs with it. > Dhruba, please file them if you rememeber. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.