[jira] [Commented] (HDFS-16740) Mini cluster test flakiness

ASF GitHub Bot (Jira) Thu, 09 Feb 2023 18:02:07 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-16740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686830#comment-17686830
 ]


ASF GitHub Bot commented on HDFS-16740:
---------------------------------------

snmvaughan commented on PR #4835:
URL: https://github.com/apache/hadoop/pull/4835#issuecomment-1425072948

   From what I've seen, many of the timeouts are related to:
   - Resources that aren't shutdown appropriately, leaving background threads 
that aren't daemon threads
   - Interactions between tests when run concurrently, where the tests operate 
using the same space
   
   This fix is focussed on both issues, as it uses a `TemporaryFolder` to avoid 
sharing space between tests and ensures that the cluster and its resources are 
always shutdown.  I'd hope that we would expand this to the remaining uses of 
the mini clusters.  Originally I had proposed new calls that actually took a 
`TemporaryFolder` as a parameter, thinking we'd eventually phase out the 
`File`-based calls and drop them entirely to ensure consistency.
   
   In addition, I've been submitting PRs that address some of the threading 
issues (e.g. [HDFS-16904](https://issues.apache.org/jira/browse/HDFS-16904) and 
[HADOOP-18279](https://issues.apache.org/jira/browse/HADOOP-18279).




> Mini cluster test flakiness
> ---------------------------
>
>                 Key: HDFS-16740
>                 URL: https://issues.apache.org/jira/browse/HDFS-16740
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs, test
>    Affects Versions: 3.4.0, 3.3.5
>            Reporter: Steve Vaughan
>            Assignee: Steve Vaughan
>            Priority: Major
>              Labels: pull-request-available
>
> Mini clusters used during HDFS unit tests are reporting test failures that do 
> not appear to be directly related to submitted changes.  The failures are the 
> result of either interactions between tests run in parallel, or tests which 
> share common disk space for tests.  In all cases, the tests can be run 
> individually serially without any errors.  Addressing this issue will 
> simplify future submissions by eliminating the confusion introduced by these 
> unrelated test failures.
> We can apply lessons recently from TestRollingUpgrade, which was recently 
> patched to unblock a recent submission.  The fixes involved changing the HDFS 
> configuration to use temporary disk space for each individual tests, and 
> using try-with-resources to ensure that clusters were shutdown cleanly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-16740) Mini cluster test flakiness

Reply via email to