[
https://issues.apache.org/jira/browse/HADOOP-19820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086293#comment-18086293
]
ASF GitHub Bot commented on HADOOP-19820:
-----------------------------------------
pan3793 opened a new pull request, #8534:
URL: https://github.com/apache/hadoop/pull/8534
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
Fix flaky tests like
https://github.com/kokonguyen191/hadoop/actions/runs/26836456695/job/79134194743
```
[INFO] Results:
[INFO]
Error: Errors:
Error:
org.apache.hadoop.hdfs.server.balancer.TestBalancerLongRunningTasks.testBalancerMetricsDuplicate
Error: Run 1:
TestBalancerLongRunningTasks.testBalancerMetricsDuplicate:835 expected: <0> but
was: <-3>
Error: Run 2:
TestBalancerLongRunningTasks.testBalancerMetricsDuplicate:797 » Metrics Metrics
source RetryCache.NameNodeRetryCache already exists!
Error: Run 3:
TestBalancerLongRunningTasks.testBalancerMetricsDuplicate:797 » Metrics Metrics
source RetryCache.NameNodeRetryCache already exists!
...
[INFO]
Error: Tests run: 1219, Failures: 0, Errors: 6, Skipped: 0
```
Two changes in `TestBalancerLongRunningTasks`:
1. `@AfterEach shutdown()` — Added `DefaultMetricsSystem.shutdown()` to
reset the metrics singleton after each test, preventing stale
`RetryCache.NameNodeRetryCache` registrations from carrying over between tests
in the same JVM.
2. `testBalancerMetricsDuplicate()` — Wrapped the
`setMiniClusterMode(false)` block in `try-finally` so
`setMiniClusterMode(true)` is always restored. Previously, if the assertion on
line 836 failed, the restoration was skipped, poisoning all subsequent test
retries.
### How was this patch tested?
Pass GHA.
### For code changes:
- [x] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
### AI Tooling
Assisted-by: OpenCode (Mimo-V2.5-Pro)
> Uber-JIRA: CI and Test Improvements
> -----------------------------------
>
> Key: HADOOP-19820
> URL: https://issues.apache.org/jira/browse/HADOOP-19820
> Project: Hadoop Common
> Issue Type: Improvement
> Components: build, test, yetus
> Affects Versions: 3.5.0
> Reporter: Aaron Fabbri
> Assignee: Aaron Fabbri
> Priority: Major
> Labels: pull-request-available
>
> Epic to organize all tasks related to CI performance and quality of life
> (QoL) improvements. Some areas that need love:
> h2. UX / DevEx
> * Easier to navigate to CI failure root cause.
> * Less noise comments in issues / PRs.
> h2. Stability
> * Fix or replace flaky tests.
> h2. Speed
> Faster execution time. We occasionally are blocked on CI runs that take over
> 24 hours:
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8225/8/]
> Other runs tend to take ~3 hours. Ideally CI should be as fast as possible
> (10-30 minutes is ideal, if unrealistic).
> h2. Coverage
> Reduce manual testing steps, e.g. for cloud storage connectors.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]