wombatu-kun opened a new pull request, #16770:
URL: https://github.com/apache/iceberg/pull/16770

   ## Summary
   
   `TestHadoopCommits.testConcurrentFastAppends` is flaky and fails 
intermittently with `org.awaitility.core.ConditionTimeoutException` at the 
barrier wait. It most recently surfaced on an unrelated PR (#16664) in 
`core-tests (17)` while the same job passed on `core-tests (21)`, the classic 
load-dependent flaky signature.
   
   ## Root cause
   
   The test runs 5 threads that each commit 10 files, using an `AtomicInteger` 
barrier to force all 5 threads to attempt `newFastAppend().commit()` 
simultaneously in lock-step rounds, maximizing optimistic-concurrency 
contention on the Hadoop file-based commit. It sets `commit.retry.num-retries` 
to `threadsCount` (5), only one above the default of 4. Under that forced 
contention, file-rename races are not fair, so a committer can lose more times 
than its retry budget allows, exhaust its retries, and throw 
`CommitFailedException`. The thread that throws never increments the barrier, 
so the surviving threads block on `barrier >= round * threadsCount` until the 
awaitility timeout fires. Because the barrier is permanently stalled, no amount 
of additional wait time helps.
   
   This is a recurrence of #11047. The previous fix (#12714) only raised the 
awaitility timeout from 10s to 60s, which addressed the symptom rather than the 
cause, so the failure came back at the 60s boundary.
   
   ## Fix
   
   Raise `commit.retry.num-retries` from `threadsCount` to `20`, matching the 
two sibling optimistic-commit concurrency tests `TestJdbcTableConcurrency` and 
`TestHiveTableConcurrency`, which both use `20` retries for 7 concurrent 
committers. With comfortable retry headroom no committer exhausts its budget 
under the forced contention, so the barrier always advances and the test no 
longer stalls. This does not weaken the test: it still verifies that all 
concurrent fast appends succeed and produce the expected number of snapshots.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to