Wei-Chiu Chuang created HDFS-11472:
--------------------------------------
Summary: Fix inconsistent replica size after a data pipeline
failure
Key: HDFS-11472
URL: https://issues.apache.org/jira/browse/HDFS-11472
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
We observed a case where a replica's on disk length is less than acknowledged
length, breaking the assumption in recovery code.
{noformat}
2017-01-08 01:41:03,532 WARN
org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to obtain
replica info for block
(=BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394519586) from
datanode (=DatanodeInfoWithStorage[10.204.138.17:1004,null,null])
java.io.IOException: THIS IS NOT SUPPOSED TO HAPPEN: getBytesOnDisk() <
getVisibleLength(), rip=ReplicaBeingWritten, blk_2526438952_1101394519586, RBW
getNumBytes() = 27530
getBytesOnDisk() = 27006
getVisibleLength()= 27268
getVolume() = /data/6/hdfs/datanode/current
getBlockFile() =
/data/6/hdfs/datanode/current/BP-947993742-10.204.0.136-1362248978912/current/rbw/blk_2526438952
bytesAcked=27268
bytesOnDisk=27006
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2284)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2260)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:2566)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.callInitReplicaRecovery(DataNode.java:2577)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2645)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:245)
at
org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2551)
at java.lang.Thread.run(Thread.java:745)
{noformat}
It turns out that if an exception is thrown within
{{BlockReceiver#receivePacket}}, the in-memory replica on disk length may not
be updated, but the data is written to disk anyway.
For example, here's one exception we observed
{noformat}
2017-01-08 01:40:59,512 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Exception for
BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394499067
java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:269)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.adjustCrcChannelPosition(FsDatasetImpl.java:1484)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.adjustCrcFilePosition(BlockReceiver.java:994)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:670)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:857)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:797)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
at java.lang.Thread.run(Thread.java:745)
{noformat}
There are potentially other places and causes where an exception is thrown
within {{BlockReceiver#receivePacket}}, so it may not make much sense to
alleviate it for this particular exception. Instead, we should improve replica
recovery code to handle the case where ondisk size is less than acknowledged
size, and update in-memory checksum accordingly.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]