[ 
https://issues.apache.org/jira/browse/HDFS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-6804:
---------------------------------------
    Attachment: HDFS-6804-branch-2.8.patch

Uploading testcase for branch-2.8..Testcase will not applicable to {{trunk}} 
and {{branch-2}} since transfer will fail after HDFS-10958 and  HDFS-11337. 
Please check trace for same.

After reverting the HDFS-11060, testcase will fail.

 *Trunk* 
{noformat}
 getBlockURI()     = 
file:/D:/OSCode/hadoop-trunk/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current/BP-1559881164-10.18.246.125-1496746386088/current/finalized/subdir0/subdir0/blk_1073741825
 reopen failed.  Unable to move meta file  
D:\OSCode\hadoop-trunk\hadoop\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\data\data1\current\BP-1559881164-10.18.246.125-1496746386088\current\finalized\subdir0\subdir0\blk_1073741825_1001.meta
 to rbw dir 
D:\OSCode\hadoop-trunk\hadoop\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\data\data1\current\BP-1559881164-10.18.246.125-1496746386088\current\rbw\blk_1073741825_1002.meta
        at 
org.apache.hadoop.hdfs.server.datanode.LocalReplicaInPipeline.moveReplicaFrom(LocalReplicaInPipeline.java:388)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.append(FsVolumeImpl.java:1194)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.java:1176)

{noformat}

 *branch-2* 
{noformat}
2017-06-06 20:01:40,314 WARN  datanode.DataNode (DataNode.java:run(2488)) - 
DatanodeRegistration(127.0.0.1:52070, 
datanodeUuid=4fd53a59-936b-4e14-836d-83c30c530c1c, infoPort=52106, 
infoSecurePort=0, ipcPort=52107, 
storageInfo=lv=-57;cid=testClusterID;nsid=2117479843;c=1496750487571):Failed to 
transfer BP-974715696-10.18.246.125-1496750487571:blk_1073741825_1001 to 
127.0.0.1:52121 got 
java.io.IOException: Block 
BP-974715696-10.18.246.125-1496750487571:blk_1073741825_1001 is not valid. 
Expected block file at null does not exist.
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:810)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:417)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2442)
{noformat}

[~jojochuang] Kindly review the testcase. Sorry,for delayed response.I missed 
this.

> race condition between transferring block and appending block causes 
> "Unexpected checksum mismatch exception" 
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6804
>                 URL: https://issues.apache.org/jira/browse/HDFS-6804
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.2.0
>            Reporter: Gordon Wang
>            Assignee: Brahma Reddy Battula
>         Attachments: HDFS-6804-branch-2.8.patch, 
> Testcase_append_transfer_block.patch
>
>
> We found some error log in the datanode. like this
> {noformat}
> 2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ex
> ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248
> java.io.IOException: Terminating due to a checksum error.java.io.IOException: 
> Unexpected checksum mismatch while writing 
> BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from 
> /192.168.2.101:39495
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
>         at java.lang.Thread.run(Thread.java:744)
> {noformat}
> While on the source datanode, the log says the block is transmitted.
> {noformat}
> 2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Da
> taTransfer: Transmitted 
> BP-2072804351-192.168.2.104-1406008383435:blk_1073741997
> _9248 (numBytes=16188152) to /192.168.2.103:50010
> {noformat}
> When the destination datanode gets the checksum mismatch, it reports bad 
> block to NameNode and NameNode marks the replica on the source datanode as 
> corrupt. But actually, the replica on the source datanode is valid. Because 
> the replica can pass the checksum verification.
> In all, the replica on the source data is wrongly marked as corrupted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to