[
https://issues.apache.org/jira/browse/HDFS-16740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686830#comment-17686830
]
ASF GitHub Bot commented on HDFS-16740:
---------------------------------------
snmvaughan commented on PR #4835:
URL: https://github.com/apache/hadoop/pull/4835#issuecomment-1425072948
From what I've seen, many of the timeouts are related to:
- Resources that aren't shutdown appropriately, leaving background threads
that aren't daemon threads
- Interactions between tests when run concurrently, where the tests operate
using the same space
This fix is focussed on both issues, as it uses a `TemporaryFolder` to avoid
sharing space between tests and ensures that the cluster and its resources are
always shutdown. I'd hope that we would expand this to the remaining uses of
the mini clusters. Originally I had proposed new calls that actually took a
`TemporaryFolder` as a parameter, thinking we'd eventually phase out the
`File`-based calls and drop them entirely to ensure consistency.
In addition, I've been submitting PRs that address some of the threading
issues (e.g. [HDFS-16904](https://issues.apache.org/jira/browse/HDFS-16904) and
[HADOOP-18279](https://issues.apache.org/jira/browse/HADOOP-18279).
> Mini cluster test flakiness
> ---------------------------
>
> Key: HDFS-16740
> URL: https://issues.apache.org/jira/browse/HDFS-16740
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs, test
> Affects Versions: 3.4.0, 3.3.5
> Reporter: Steve Vaughan
> Assignee: Steve Vaughan
> Priority: Major
> Labels: pull-request-available
>
> Mini clusters used during HDFS unit tests are reporting test failures that do
> not appear to be directly related to submitted changes. The failures are the
> result of either interactions between tests run in parallel, or tests which
> share common disk space for tests. In all cases, the tests can be run
> individually serially without any errors. Addressing this issue will
> simplify future submissions by eliminating the confusion introduced by these
> unrelated test failures.
> We can apply lessons recently from TestRollingUpgrade, which was recently
> patched to unblock a recent submission. The fixes involved changing the HDFS
> configuration to use temporary disk space for each individual tests, and
> using try-with-resources to ensure that clusters were shutdown cleanly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]