[ 
https://issues.apache.org/jira/browse/HDFS-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553173#comment-13553173
 ] 

Colin Patrick McCabe commented on HDFS-4051:
--------------------------------------------

It seems like the root cause is here:

{code}
2012-10-15 04:36:34,233 WARN  datanode.DataNode (BlockReceiver.java:run(1014)) 
- IOException in BlockReceiver.run(): 
java.io.IOException: Failed to move meta file for ReplicaBeingWritten, 
blk_2249867812860029570_1002, RBW
  getNumBytes()     = 119
  getBytesOnDisk()  = 119
  getVisibleLength()= 119
  getVolume()       = 
/var/lib/jenkins/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current
  getBlockFile()    = 
/var/lib/jenkins/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current/BP-1571706504-172.29.81.21-1350300990925/current/rbw/blk_2249867812860029570
  bytesAcked=119
  bytesOnDisk=119 from 
/var/lib/jenkins/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current/BP-1571706504-172.29.81.21-1350300990925/current/rbw/blk_2249867812860029570_1002.meta
 to 
/var/lib/jenkins/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current/BP-1571706504-172.29.81.21-1350300990925/current/finalized/blk_2249867812860029570_1002.meta
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.moveBlockFiles(FsDatasetImpl.java:403)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LDir.addBlock(LDir.java:78)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LDir.addBlock(LDir.java:71)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addBlock(BlockPoolSlice.java:152)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlock(FsVolumeImpl.java:162)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeReplica(FsDatasetImpl.java:874)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:855)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:965)
        at java.lang.Thread.run(Thread.java:662)
{code}

Sadly, {{java.io.FileSystem#renameTo}} doesn't provide an error message or 
code, just a true/false return.  So the true cause may always be a mystery.  It 
could be a disk going bad.

If we want to debug this, I think we have to find some way to get at the cause 
of the rename failing.  Possible solutions include:
* writing a JNI method that doesn't eat the error code
* testing things we think *might* be true (does 
/var/lib/jenkins/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current/BP-1571706504-172.29.81.21-1350300990925/current/finalized
 not exist, for example?)
                
> TestRBWBlockInvalidation.testBlockInvalidationWhenRBWReplicaMissedInDN failed
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-4051
>                 URL: https://issues.apache.org/jira/browse/HDFS-4051
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.0.0-alpha
>            Reporter: Eli Collins
>              Labels: test-fail
>         Attachments: test.log.gz
>
>
> TestRBWBlockInvalidation.testBlockInvalidationWhenRBWReplicaMissedInDN failed 
> with the following on a recent trunk job. Has been seen before in HDFS-3391 
> as well on branch-2.
> {noformat}
> java.lang.AssertionError: There should be 1 replica in the corruptReplicasMap 
> expected:<1> but was:<0>
>       at org.junit.Assert.fail(Assert.java:91)
>       at org.junit.Assert.failNotEquals(Assert.java:645)
>       at org.junit.Assert.assertEquals(Assert.java:126)
>       at org.junit.Assert.assertEquals(Assert.java:470)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation.testBlockInvalidationWhenRBWReplicaMissedInDN(TestRBWBlockInvalidation.java:99)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to