[jira] [Commented] (HDFS-11472) Fix inconsistent replica size after a data pipeline failure

Wei-Chiu Chuang (JIRA) Mon, 05 Jun 2017 12:10:24 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037419#comment-16037419
 ]


Wei-Chiu Chuang commented on HDFS-11472:
----------------------------------------

Hi [~xkrogen]. Thanks for the comment.
I think there's no harm adding the extra warning, as it might still be possible 
a similar error creeps into it even after this fix.

I am not so sure about resetting the replica's in-memory last chunk checksum. 
After the recovery initiated by {{initReplicaRecoveryImpl}}, the block may be 
read, and if the LCC does not match the data in the last chunk, the reader 
would erroneously believe the block is corrupt, which defeats the purpose of 
the fix.

The reason that {{recoverRbwImpl}} resets the replica's in-memory LCC to null, 
is that after the recovery, the block is immediately being written, so the LCC 
won't match the chunk data anyway (after the block is finalized, LCC is 
updated), and there is little benefit in making the LCC correct.

> Fix inconsistent replica size after a data pipeline failure
> -----------------------------------------------------------
>
>                 Key: HDFS-11472
>                 URL: https://issues.apache.org/jira/browse/HDFS-11472
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Critical
>              Labels: release-blocker
>         Attachments: HDFS-11472.001.patch, HDFS-11472.002.patch, 
> HDFS-11472.003.patch, HDFS-11472.testcase.patch
>
>
> We observed a case where a replica's on disk length is less than acknowledged 
> length, breaking the assumption in recovery code.
> {noformat}
> 2017-01-08 01:41:03,532 WARN 
> org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to 
> obtain replica info for block 
> (=BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394519586) from 
> datanode (=DatanodeInfoWithStorage[10.204.138.17:1004,null,null])
> java.io.IOException: THIS IS NOT SUPPOSED TO HAPPEN: getBytesOnDisk() < 
> getVisibleLength(), rip=ReplicaBeingWritten, blk_2526438952_1101394519586, RBW
>   getNumBytes()     = 27530
>   getBytesOnDisk()  = 27006
>   getVisibleLength()= 27268
>   getVolume()       = /data/6/hdfs/datanode/current
>   getBlockFile()    = 
> /data/6/hdfs/datanode/current/BP-947993742-10.204.0.136-1362248978912/current/rbw/blk_2526438952
>   bytesAcked=27268
>   bytesOnDisk=27006
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2284)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2260)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:2566)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.callInitReplicaRecovery(DataNode.java:2577)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2645)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:245)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2551)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> It turns out that if an exception is thrown within 
> {{BlockReceiver#receivePacket}}, the in-memory replica on disk length may not 
> be updated, but the data is written to disk anyway.
> For example, here's one exception we observed
> {noformat}
> 2017-01-08 01:40:59,512 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Exception for 
> BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394499067
> java.nio.channels.ClosedByInterruptException
>         at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>         at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:269)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.adjustCrcChannelPosition(FsDatasetImpl.java:1484)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.adjustCrcFilePosition(BlockReceiver.java:994)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:670)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:857)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:797)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> There are potentially other places and causes where an exception is thrown 
> within {{BlockReceiver#receivePacket}}, so it may not make much sense to 
> alleviate it for this particular exception. Instead, we should improve 
> replica recovery code to handle the case where ondisk size is less than 
> acknowledged size, and update in-memory checksum accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-11472) Fix inconsistent replica size after a data pipeline failure

Reply via email to