rusackas opened a new pull request, #41069:
URL: https://github.com/apache/superset/pull/41069

   ### SUMMARY
   
   A large share of `Build & publish docker images` failures on master are 
**transient Docker Hub registry errors** during the supersetbot build/push, not 
real build breaks. From classifying recent failures:
   
   - base-image pull timeouts — `Get "https://registry-1.docker.io/v2/": ... 
Client.Timeout exceeded`
   - push failures — `failed to push ...: 504 Gateway Timeout` / `401 
Unauthorized` from `auth.docker.io`
   - `npm error network ... ECONNRESET`
   
   Each of these fails the whole matrix leg even though a re-run would succeed. 
This wraps the `supersetbot docker` build invocation in a 3-attempt retry loop 
with a 30s backoff, mirroring the retry that already exists on the subsequent 
`Docker pull` step. buildx reuses the buildkit layer cache from the failed 
attempt, so a retry mostly re-does just the failed push rather than rebuilding 
from scratch.
   
   This is a flakiness mitigation, paired with the disk-space fix (#41068) and 
the GHCR service-image mirror (#40880) — together they target the three 
distinct failure clusters (build network, disk, service-container network). It 
does not address the stale `apache/superset-cache` cache reference, which is a 
separate (cross-repo) issue.
   
   ### TESTING INSTRUCTIONS
   
   CI: on a transient registry error, the build step now logs `Build attempt N 
failed; retrying in 30s...` and continues, rather than failing the job. A 
genuine build error still fails after 3 attempts.
   
   ### ADDITIONAL INFORMATION
   
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to