[
https://issues.apache.org/jira/browse/HDFS-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765263#comment-17765263
]
ASF GitHub Bot commented on HDFS-17102:
---------------------------------------
teamconfx opened a new pull request, #6072:
URL: https://github.com/apache/hadoop/pull/6072
### Description of PR
https://issues.apache.org/jira/browse/HDFS-17102
This PR adds a lower bound for the number of packets injected in the test to
avoid timeout.
### How was this patch tested?
(1) Set `dfs.datanode.peer.metrics.min.outlier.detection.samples` to `0`
(2) Run test:
`org.apache.hadoop.hdfs.server.datanode.metrics.TestDataNodeOutlierDetectionViaMetrics#testOutlierIsDetected`
The test no longer times out and passes.
### For code changes:
- [x] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
> Timeout encountered when running TestDataNodeOutlierDetectionViaMetrics
> -----------------------------------------------------------------------
>
> Key: HDFS-17102
> URL: https://issues.apache.org/jira/browse/HDFS-17102
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: ConfX
> Priority: Critical
> Attachments: reproduce.sh
>
>
> h2. What happened:
> Got a timeout when running {{TestDataNodeOutlierDetectionViaMetrics}} and
> setting min outlier to 0 or negative.
> h2. Where's the bug:
> In {{TestDataNodeOutlierDetectionViaMetrics.injectFastNodesSamples}} the test
> injects several packets into the nodes:
> {noformat}
> for (int i = 0;
> i < 2 * peerMetrics.getMinOutlierDetectionSamples();
> ++i) {
> peerMetrics.addSendPacketDownstream(
> nodeName, random.nextInt(FAST_NODE_MAX_LATENCY_MS));
> }{noformat}
> A similar logic appears in the {{{}injectSlowNodesSamples{}}}. A problem with
> this code is that if
> {{dfs.datanode.peer.metrics.min.outlier.detection.samples}} is set to
> negative or 0, no packet would be injected and the {{waitFor}} later:
> {noformat}
> GenericTestUtils.waitFor(new Supplier<Boolean>() {
> @Override
> public Boolean get() {
> return peerMetrics.getOutliers().size() > 0;
> }
> }, 500, 100_000);{noformat}
> would keeping waiting until timeout.
> h2. How to reproduce:
> (1) Set {{dfs.datanode.peer.metrics.min.outlier.detection.samples }} to {{0}}
> (2) Run test:
> {{org.apache.hadoop.hdfs.server.datanode.metrics.TestDataNodeOutlierDetectionViaMetrics#testOutlierIsDetected}}
> h2. Stacktrace:
>
> {noformat}
> java.util.concurrent.TimeoutException:
> Timed out waiting for condition.
> Thread diagnostics:
> Timestamp: 2023-07-04 04:08:54,535
> "Reference Handler" daemon prio=10 tid=2 runnable
> java.lang.Thread.State: RUNNABLE
> at
> [email protected]/java.lang.ref.Reference.waitForReferencePendingList(Native
> Method)
> at
> [email protected]/java.lang.ref.Reference.processPendingReferences(Reference.java:241)
> at
> [email protected]/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:213)
> "surefire-forkedjvm-command-thread" daemon prio=5 tid=23 runnable
> java.lang.Thread.State: RUNNABLE
> ...
> {noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]