[
https://issues.apache.org/jira/browse/HDFS-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590228#comment-14590228
]
Hudson commented on HDFS-4660:
------------------------------
FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #229 (See
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/229/])
HDFS-4660. Block corruption can happen during pipeline recovery. Contributed by
Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1)
*
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
> Block corruption can happen during pipeline recovery
> ----------------------------------------------------
>
> Key: HDFS-4660
> URL: https://issues.apache.org/jira/browse/HDFS-4660
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 3.0.0, 2.0.3-alpha
> Reporter: Peng Zhang
> Assignee: Kihwal Lee
> Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: HDFS-4660.patch, HDFS-4660.patch, HDFS-4660.v2.patch
>
>
> pipeline DN1 DN2 DN3
> stop DN2
> pipeline added node DN4 located at 2nd position
> DN1 DN4 DN3
> recover RBW
> DN4 after recover rbw
> 2013-04-01 21:02:31,570 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover
> RBW replica
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1004
> 2013-04-01 21:02:31,570 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW
> getNumBytes() = 134144
> getBytesOnDisk() = 134144
> getVisibleLength()= 134144
> end at chunk (134144/512=262)
> DN3 after recover rbw
> 2013-04-01 21:02:31,575 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover
> RBW replica
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_10042013-04-01
> 21:02:31,575 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW
> getNumBytes() = 134028
> getBytesOnDisk() = 134028
> getVisibleLength()= 134028
> client send packet after recover pipeline
> offset=133632 len=1008
> DN4 after flush
> 2013-04-01 21:02:31,779 DEBUG
> org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file
> offset:134640; meta offset:1063
> // meta end position should be floor(134640/512)*4 + 7 == 1059, but now it is
> 1063.
> DN3 after flush
> 2013-04-01 21:02:31,782 DEBUG
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005,
> type=LAST_IN_PIPELINE, downstreams=0:[]: enqueue Packet(seqno=219,
> lastPacketInBlock=false, offsetInBlock=134640,
> ackEnqueueNanoTime=8817026136871545)
> 2013-04-01 21:02:31,782 DEBUG
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Changing
> meta file offset of block
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005 from
> 1055 to 1051
> 2013-04-01 21:02:31,782 DEBUG
> org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file
> offset:134640; meta offset:1059
> After checking meta on DN4, I found checksum of chunk 262 is duplicated, but
> data not.
> Later after block was finalized, DN4's scanner detected bad block, and then
> reported it to NM. NM send a command to delete this block, and replicate this
> block from other DN in pipeline to satisfy duplication num.
> I think this is because in BlockReceiver it skips data bytes already written,
> but not skips checksum bytes already written. And function
> adjustCrcFilePosition is only used for last non-completed chunk, but
> not for this situation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)