[
https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866940#action_12866940
]
Todd Lipcon commented on HDFS-142:
----------------------------------
Had a test failure of TestFileAppend2 today with:
[junit] 2010-05-12 12:20:46,249 WARN protocol.InterDatanodeProtocol
(DataNode.java:recoverBlock(1537)) - Failed to getBlockMetaDataInfo for block
(=blk_7206139570868165957_1054) from datanode (=127.0.0.1:42179)
[junit] java.io.IOException: Block blk_7206139570868165957_1054 does not
exist in volumeMap.
[junit] at
org.apache.hadoop.hdfs.server.datanode.FSDataset.validateBlockMetadata(FSDataset.java:1250)
[junit] at
org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockMetaDataInfo(DataNode.java:1425)
[junit] at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1521)
[junit] at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1616)
This failure was actually on our vanilla 0.20 Hudson, not on the append branch.
In investigating this I noticed that validateBlockMetadata is not marked
synchronized in FSDataset, and thus accesses the volumeMap HashMap in an
unsynchronized matter. If this races with eg a rehash of the hashmap, it can
give false non-existence.
Doesn't seem to be a problem in trunk append (this function is gone)
> In 0.20, move blocks being written into a blocksBeingWritten directory
> ----------------------------------------------------------------------
>
> Key: HDFS-142
> URL: https://issues.apache.org/jira/browse/HDFS-142
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Raghu Angadi
> Assignee: dhruba borthakur
> Priority: Blocker
> Attachments: appendQuestions.txt, deleteTmp.patch, deleteTmp2.patch,
> deleteTmp5_20.txt, deleteTmp5_20.txt, deleteTmp_0.18.patch, handleTmp1.patch,
> hdfs-142-commitBlockSynchronization-unknown-datanode.txt,
> HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt,
> hdfs-142-minidfs-fix-from-409.txt,
> HDFS-142-multiple-blocks-datanode-exception.patch,
> hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt,
> hdfs-142-testleaserecovery-fix.txt, HDFS-142_20.patch,
> testfileappend4-deaddn.txt
>
>
> Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp
> directory since these files are not valid anymore. But in 0.18 it moves these
> files to normal directory incorrectly making them valid blocks. One of the
> following would work :
> - remove the tmp files during upgrade, or
> - if the files under /tmp are in pre-18 format (i.e. no generation), delete
> them.
> Currently effect of this bug is that, these files end up failing block
> verification and eventually get deleted. But cause incorrect over-replication
> at the namenode before that.
> Also it looks like our policy regd treating files under tmp needs to be
> defined better. Right now there are probably one or two more bugs with it.
> Dhruba, please file them if you rememeber.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.