[
https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jim Brennan updated HDFS-13339:
-------------------------------
Attachment: HDFS-13339-branch-2.10.001.patch
Status: Patch Available (was: Reopened)
We have been seeing intermittent test failures on branch-2.10 in
TestBlockStatsMXBean.
I applied the patch from this Jira and it does seem to resolve the intermittent
failures.
Can we please pull this back to branch-2.10? I am submitting a patch for it -
only change from the original was replacing the lambda in the unit test.
> Volume reference can't be released and may lead to deadlock when DataXceiver
> does a check volume
> ------------------------------------------------------------------------------------------------
>
> Key: HDFS-13339
> URL: https://issues.apache.org/jira/browse/HDFS-13339
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Environment: os: Linux 2.6.32-358.el6.x86_64
> hadoop version: hadoop-3.2.0-SNAPSHOT
> unit: mvn test -Pnative
> -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart
> Reporter: liaoyuxiangqin
> Assignee: Zsolt Venczel
> Priority: Critical
> Labels: DataNode, volumes
> Fix For: 3.0.4, 3.1.1, 3.2.0
>
> Attachments: HDFS-13339-branch-2.10.001.patch, HDFS-13339.001.patch,
> HDFS-13339.002.patch, HDFS-13339.003.patch, HDFS-13339.004.patch
>
>
> When i execute Unit Test of
> TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart,
> the process blocks on waitReplication, detail information as follows:
> [INFO] -------------------------------------------------------
> [INFO] T E S T S
> [INFO] -------------------------------------------------------
> [INFO] Running
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 307.492 s <<< FAILURE! - in
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
> [ERROR]
> testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting)
> Time elapsed: 307.206 s <<< ERROR!
> java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach
> 2 replicas
> at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800)
> at
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]