[
https://issues.apache.org/jira/browse/HDFS-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432306#comment-15432306
]
Manoj Govindassamy commented on HDFS-10780:
-------------------------------------------
[~shahrs87], I do see HDFS-9781 (NPE during Full Block Report and especially
after a volume removal) quite frequently in my test
(TestDataNodeHotSwapVolumes) runs. But for these tests Incremental Block
Reports from DataNodes are sufficient and they do work as expected. Block
Report generations are happening within a try catch block and they are ignoring
any encountered exceptions. Thanks for pointing me to the other jira, will
follow up on that as well.
> Block replication not happening on removing a volume when data being written
> to a datanode -- TestDataNodeHotSwapVolumes fails
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-10780
> URL: https://issues.apache.org/jira/browse/HDFS-10780
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Affects Versions: 3.0.0-alpha1
> Reporter: Manoj Govindassamy
> Assignee: Manoj Govindassamy
>
> TestDataNodeHotSwapVolumes occasionally fails in the unit test
> testRemoveVolumeBeingWrittenForDatanode. Data write pipeline can have issues
> as there could be timeouts, data node not reachable etc, and in this test
> case it was more of induced one as one of the volumes in a datanode is
> removed while block write is in progress. Digging further in the logs, when
> the problem happens in the write pipeline, the error recovery is not
> happening as expected leading to block replication never catching up.
> Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 44.495 sec
> <<< FAILURE! - in org.apache.hadoop.hdfs.serv
> testRemoveVolumeBeingWritten(org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes)
> Time elapsed: 44.354 se
> java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3
> replicas
> Results :
> Tests in error:
>
> TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten:637->testRemoveVolumeBeingWrittenForDatanode:714
> ยป Timeout
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
> Following exceptions are not expected in this test run
> {noformat}
> 614 2016-08-10 12:30:11,269 [DataXceiver for client
> DFSClient_NONMAPREDUCE_-640082112_10 at /127.0.0.1:58805 [Receiving block
> BP-1852988604-172.16.3.66-1470857409044:blk_1073741825_1001]] DEBUG
> datanode.Da taNode (DataXceiver.java:run(320)) - 127.0.0.1:58789:Number
> of active connections is: 2
> 615 java.lang.IllegalMonitorStateException
> 616 at java.lang.Object.wait(Native Method)
> 617 at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:280)
> 618 at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:517)
> 619 at
> org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:832)
> 620 at
> org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:798)
> {noformat}
> {noformat}
> 720 2016-08-10 12:30:11,287 [DataNode:
> [[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
> [DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-projec
> t/hadoop-hdfs/target/test/data/dfs/data/data2/]] heartbeating to
> localhost/127.0.0.1:58788] ERROR datanode.DataNode
> (BPServiceActor.java:run(768)) - Exception in BPOfferService for Block pool
> BP-18529 88604-172.16.3.66-1470857409044 (Datanode Uuid
> 711d58ad-919d-4350-af1e-99fa0b061244) service to localhost/127.0.0.1:58788
> 721 java.lang.NullPointerException
> 722 at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1841)
> 723 at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:336)
> 724 at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:624)
> 725 at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:766)
> 726 at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]