ConfX created HDFS-17102: ---------------------------- Summary: Timeout encountered when running TestDataNodeOutlierDetectionViaMetrics Key: HDFS-17102 URL: https://issues.apache.org/jira/browse/HDFS-17102 Project: Hadoop HDFS Issue Type: Bug Reporter: ConfX Attachments: reproduce.sh
h2. What happened: Got a timeout when running {{TestDataNodeOutlierDetectionViaMetrics}} and setting min outlier to 0 or negative. h2. Where's the bug: In {{TestDataNodeOutlierDetectionViaMetrics.injectFastNodesSamples}} the test injects several packets into the nodes: {noformat} for (int i = 0; i < 2 * peerMetrics.getMinOutlierDetectionSamples(); ++i) { peerMetrics.addSendPacketDownstream( nodeName, random.nextInt(FAST_NODE_MAX_LATENCY_MS)); }{noformat} A similar logic appears in the {{{}injectSlowNodesSamples{}}}. A problem with this code is that if {{dfs.datanode.peer.metrics.min.outlier.detection.samples}} is set to negative or 0, no packet would be injected and the {{waitFor}} later: {noformat} GenericTestUtils.waitFor(new Supplier<Boolean>() { @Override public Boolean get() { return peerMetrics.getOutliers().size() > 0; } }, 500, 100_000);{noformat} would keeping waiting until timeout. h2. How to reproduce: (1) Set {{dfs.datanode.peer.metrics.min.outlier.detection.samples }} to {{0}} (2) Run test: {{org.apache.hadoop.hdfs.server.datanode.metrics.TestDataNodeOutlierDetectionViaMetrics#testOutlierIsDetected}} h2. Stacktrace: {noformat} java.util.concurrent.TimeoutException: Timed out waiting for condition. Thread diagnostics: Timestamp: 2023-07-04 04:08:54,535 "Reference Handler" daemon prio=10 tid=2 runnable java.lang.Thread.State: RUNNABLE at java.base@11.0.18/java.lang.ref.Reference.waitForReferencePendingList(Native Method) at java.base@11.0.18/java.lang.ref.Reference.processPendingReferences(Reference.java:241) at java.base@11.0.18/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:213) "surefire-forkedjvm-command-thread" daemon prio=5 tid=23 runnable java.lang.Thread.State: RUNNABLE ... {noformat} For an easy reproduction, run the reproduce.sh in the attachment. We are happy to provide a patch if this issue is confirmed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org