[ 
https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866940#action_12866940
 ] 

Todd Lipcon commented on HDFS-142:
----------------------------------

Had a test failure of TestFileAppend2 today with:

   [junit] 2010-05-12 12:20:46,249 WARN  protocol.InterDatanodeProtocol 
(DataNode.java:recoverBlock(1537)) - Failed to getBlockMetaDataInfo for block 
(=blk_7206139570868165957_1054) from datanode (=127.0.0.1:42179)
    [junit] java.io.IOException: Block blk_7206139570868165957_1054 does not 
exist in volumeMap.
    [junit]     at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.validateBlockMetadata(FSDataset.java:1250)
    [junit]     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockMetaDataInfo(DataNode.java:1425)
    [junit]     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1521)
    [junit]     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1616)

This failure was actually on our vanilla 0.20 Hudson, not on the append branch.

In investigating this I noticed that validateBlockMetadata is not marked 
synchronized in FSDataset, and thus accesses the volumeMap HashMap in an 
unsynchronized matter. If this races with eg a rehash of the hashmap, it can 
give false non-existence.

Doesn't seem to be a problem in trunk append (this function is gone)

> In 0.20, move blocks being written into a blocksBeingWritten directory
> ----------------------------------------------------------------------
>
>                 Key: HDFS-142
>                 URL: https://issues.apache.org/jira/browse/HDFS-142
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Raghu Angadi
>            Assignee: dhruba borthakur
>            Priority: Blocker
>         Attachments: appendQuestions.txt, deleteTmp.patch, deleteTmp2.patch, 
> deleteTmp5_20.txt, deleteTmp5_20.txt, deleteTmp_0.18.patch, handleTmp1.patch, 
> hdfs-142-commitBlockSynchronization-unknown-datanode.txt, 
> HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, 
> hdfs-142-minidfs-fix-from-409.txt, 
> HDFS-142-multiple-blocks-datanode-exception.patch, 
> hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, 
> hdfs-142-testleaserecovery-fix.txt, HDFS-142_20.patch, 
> testfileappend4-deaddn.txt
>
>
> Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp  
> directory since these files are not valid anymore. But in 0.18 it moves these 
> files to normal directory incorrectly making them valid blocks. One of the 
> following would work :
> - remove the tmp files during upgrade, or
> - if the files under /tmp are in pre-18 format (i.e. no generation), delete 
> them.
> Currently effect of this bug is that, these files end up failing block 
> verification and eventually get deleted. But cause incorrect over-replication 
> at the namenode before that.
> Also it looks like our policy regd treating files under tmp needs to be 
> defined better. Right now there are probably one or two more bugs with it. 
> Dhruba, please file them if you rememeber.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to