wombatu-kun opened a new pull request, #16540:
URL: https://github.com/apache/iceberg/pull/16540

   ## Summary
   
   Closes #16530.
   
   The `iceberg-azure` integration tests (`TestADLSFileIO`, 
`TestADLSInputStream`, `TestADLSOutputStream`) start a shared Azurite container 
in `AzuriteTestBase.@BeforeAll`. They intermittently fail at startup with a 
`ContainerFetchException` caused by a `NotFoundException` (HTTP 404) while 
testcontainers pulls `mcr.microsoft.com/azure-storage/azurite:3.35.0`. The 
image tag is already pinned, so this is a transient registry/CDN failure 
(rate-limiting or CDN inconsistency), not a missing image — re-running usually 
succeeds, which is exactly what makes these tests flaky.
   
   ## What changed
   
   `AzuriteContainer` now overrides `start()` to retry transient image-fetch 
failures before giving up: up to 5 attempts with exponential backoff (2s → 4s → 
8s → 16s). Each retry re-invokes `start()`, which re-triggers the image pull, 
because testcontainers' `RemoteDockerImage` does not cache a failed resolution.
   
   Notably, testcontainers' own `withStartupAttempts(...)` does not help here: 
in testcontainers 2.0.5 the image is fetched (`getDockerImageName()` → 
`RemoteDockerImage`) *before* the `startupAttempts` retry loop in 
`GenericContainer.doStart()`, so a fetch failure short-circuits that loop. 
Retrying `start()` itself is what actually re-attempts the pull.
   
   The retry is deliberately narrow: it only retries when the failure (or its 
cause chain) is a `ContainerFetchException`. Genuine launch failures — e.g. a 
wait-strategy timeout surfaced as a `ContainerLaunchException` without a fetch 
cause — are rethrown immediately so real problems are never masked.
   
   ## Tests
   
   - New `TestAzuriteContainer` exercises the retry helper without Docker: 
succeeds without retry, retries until success, rethrows after exhausting 
attempts, does not retry non-fetch launch failures, and retries a fetch failure 
wrapped in a `ContainerLaunchException`.
   - Confirmed the existing Azurite integration tests still pass through the 
overridden `start()` (`TestADLSFileIO`, `TestADLSInputStream`, 
`TestADLSOutputStream`) with Docker.
   
   ## Note
   
   This reduces the flakiness by tolerating transient fetch hiccups; it cannot 
mask a registry outage that persists across all retries.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to