[
https://issues.apache.org/jira/browse/HDFS-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553173#comment-13553173
]
Colin Patrick McCabe commented on HDFS-4051:
--------------------------------------------
It seems like the root cause is here:
{code}
2012-10-15 04:36:34,233 WARN datanode.DataNode (BlockReceiver.java:run(1014))
- IOException in BlockReceiver.run():
java.io.IOException: Failed to move meta file for ReplicaBeingWritten,
blk_2249867812860029570_1002, RBW
getNumBytes() = 119
getBytesOnDisk() = 119
getVisibleLength()= 119
getVolume() =
/var/lib/jenkins/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current
getBlockFile() =
/var/lib/jenkins/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current/BP-1571706504-172.29.81.21-1350300990925/current/rbw/blk_2249867812860029570
bytesAcked=119
bytesOnDisk=119 from
/var/lib/jenkins/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current/BP-1571706504-172.29.81.21-1350300990925/current/rbw/blk_2249867812860029570_1002.meta
to
/var/lib/jenkins/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current/BP-1571706504-172.29.81.21-1350300990925/current/finalized/blk_2249867812860029570_1002.meta
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.moveBlockFiles(FsDatasetImpl.java:403)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LDir.addBlock(LDir.java:78)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LDir.addBlock(LDir.java:71)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addBlock(BlockPoolSlice.java:152)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlock(FsVolumeImpl.java:162)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeReplica(FsDatasetImpl.java:874)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:855)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:965)
at java.lang.Thread.run(Thread.java:662)
{code}
Sadly, {{java.io.FileSystem#renameTo}} doesn't provide an error message or
code, just a true/false return. So the true cause may always be a mystery. It
could be a disk going bad.
If we want to debug this, I think we have to find some way to get at the cause
of the rename failing. Possible solutions include:
* writing a JNI method that doesn't eat the error code
* testing things we think *might* be true (does
/var/lib/jenkins/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current/BP-1571706504-172.29.81.21-1350300990925/current/finalized
not exist, for example?)
> TestRBWBlockInvalidation.testBlockInvalidationWhenRBWReplicaMissedInDN failed
> -----------------------------------------------------------------------------
>
> Key: HDFS-4051
> URL: https://issues.apache.org/jira/browse/HDFS-4051
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: test
> Affects Versions: 2.0.0-alpha
> Reporter: Eli Collins
> Labels: test-fail
> Attachments: test.log.gz
>
>
> TestRBWBlockInvalidation.testBlockInvalidationWhenRBWReplicaMissedInDN failed
> with the following on a recent trunk job. Has been seen before in HDFS-3391
> as well on branch-2.
> {noformat}
> java.lang.AssertionError: There should be 1 replica in the corruptReplicasMap
> expected:<1> but was:<0>
> at org.junit.Assert.fail(Assert.java:91)
> at org.junit.Assert.failNotEquals(Assert.java:645)
> at org.junit.Assert.assertEquals(Assert.java:126)
> at org.junit.Assert.assertEquals(Assert.java:470)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation.testBlockInvalidationWhenRBWReplicaMissedInDN(TestRBWBlockInvalidation.java:99)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira