[
https://issues.apache.org/jira/browse/HDFS-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440406#comment-15440406
]
Manoj Govindassamy edited comment on HDFS-10780 at 8/27/16 1:28 AM:
--------------------------------------------------------------------
The core issue here is in the handling *write pipeline errors* and the follow
on *race condition* between the following events
-- Client sending Block COMMIT to the NN
-- DN sending IBR with stale Block (the one with old generation stamp) info and
-- DN sending IBR with right Block (the one with expected generation stamp) info
I have been seeing
TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWrittenForDatanode failing very
frequently on the latest trunk. Though they all fail with same signature "Timed
out waiting for /test to reach 3 replicas", there could be more than one issue
here as I can see different code paths being taken in the failed logs. One
common thing among the failures is they are all happening under pipeline error
recovery case.
*Problem 1:*
-- BlockManager failing to trigger block replication in time to the missed out
DN (the DN which is not in the write pipeline after pipeline error recovery)
-- BlockManager mistakenly believes that there is already a block
reconstruction to the last DN in progress and starts monitoring it using
pendingReconstruction list
-- Previous comment explains why BlockManager trapped into such belief.
*Problem 2:*
-- This is totally different from the previous case
-- A DN reported (in the IBR) the receipt of Block with the right generation
stamp. But, BlockManager failed to add this StoredBlock for the reporting DN
-- Later BlockManager detects (erroneously) the replication factor not caught
up and tries to replicate the block the missing DN. Except, the
{{BlockPlacementPolicy}} engine fails to find a target node as it sees all the
given nodes already have the given Block. This detection and failed replication
continues on and on.
I have theories for both of the above problems. Will try to elaborate more on
further comments and will love to have your feedback on my understandings.
was (Author: manojg):
The core issue here is in the handling *write pipeline errors* and the follow
on *race condition* between the following events
-- Client sending Block COMMIT to the NN
-- DN sending IBR with stale Block (the one with old generation stamp) info and
-- DN sending IBR with right Block (the one with expected generation stamp) info
I have been seeing
TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWrittenForDatanode failing very
frequently on the latest trunk. Though they all fail with same signature "Timed
out waiting for /test to reach 3 replicas", there could be more than one issue
here as I can see different code paths being taken in the failed logs. One
common thing among the failures is they are all happening under pipeline error
recovery case.
*Problem 1:*
-- BlockManager failing to trigger block replication in time to the missed out
DN (the DN which is not in the write pipeline after pipeline error recovery)
-- BlockManager mistakenly believes that there is already a block
reconstruction to the last DN in progress and starts monitoring it using
pendingReconstruction list
-- Previous comment explains why BlockManager trapped into such belief.
*Problem 2:*
-- This is totally different from the previous case
-- A DN reported (in the IBR) the receipt of Block with the right generation
stamp. But, BlockManager failed to add this StoredBlock for the reporting DN
-- Later BlockManager detects (erroneously) the replication factor not caught
up and tries to replicate the block the missing DN. Except, the
{{BlockPlacementPolicy}} engine fails to find a target node as it sees all the
given nodes already have the given Block. This detection and failed replication
continues on and on.
I have theories for both of the above problems. Will try to would elaborate
here and will love to have your feedback on my understandings.
> Block replication not proceeding after pipeline recovery --
> TestDataNodeHotSwapVolumes fails
> --------------------------------------------------------------------------------------------
>
> Key: HDFS-10780
> URL: https://issues.apache.org/jira/browse/HDFS-10780
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Affects Versions: 3.0.0-alpha1
> Reporter: Manoj Govindassamy
> Assignee: Manoj Govindassamy
>
> TestDataNodeHotSwapVolumes occasionally fails in the unit test
> testRemoveVolumeBeingWrittenForDatanode. Data write pipeline can have issues
> as there could be timeouts, data node not reachable etc, and in this test
> case it was more of induced one as one of the volumes in a datanode is
> removed while block write is in progress. Digging further in the logs, when
> the problem happens in the write pipeline, the error recovery is not
> happening as expected leading to block replication never catching up.
> Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 44.495 sec
> <<< FAILURE! - in org.apache.hadoop.hdfs.serv
> testRemoveVolumeBeingWritten(org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes)
> Time elapsed: 44.354 se
> java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3
> replicas
> Results :
> Tests in error:
>
> TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten:637->testRemoveVolumeBeingWrittenForDatanode:714
> ยป Timeout
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
> Following exceptions are not expected in this test run
> {noformat}
> 614 2016-08-10 12:30:11,269 [DataXceiver for client
> DFSClient_NONMAPREDUCE_-640082112_10 at /127.0.0.1:58805 [Receiving block
> BP-1852988604-172.16.3.66-1470857409044:blk_1073741825_1001]] DEBUG
> datanode.Da taNode (DataXceiver.java:run(320)) - 127.0.0.1:58789:Number
> of active connections is: 2
> 615 java.lang.IllegalMonitorStateException
> 616 at java.lang.Object.wait(Native Method)
> 617 at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:280)
> 618 at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:517)
> 619 at
> org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:832)
> 620 at
> org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:798)
> {noformat}
> {noformat}
> 720 2016-08-10 12:30:11,287 [DataNode:
> [[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
> [DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-projec
> t/hadoop-hdfs/target/test/data/dfs/data/data2/]] heartbeating to
> localhost/127.0.0.1:58788] ERROR datanode.DataNode
> (BPServiceActor.java:run(768)) - Exception in BPOfferService for Block pool
> BP-18529 88604-172.16.3.66-1470857409044 (Datanode Uuid
> 711d58ad-919d-4350-af1e-99fa0b061244) service to localhost/127.0.0.1:58788
> 721 java.lang.NullPointerException
> 722 at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1841)
> 723 at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:336)
> 724 at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:624)
> 725 at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:766)
> 726 at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]