Suresh-Krishna-Kusuma opened a new issue, #10457:
URL: https://github.com/apache/seatunnel/issues/10457

   ### Search before asking
   
   - [x] I had searched in the 
[feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Description
   
   
   ### Description
   The `SqlServerSchemaChangeIT` and related JDBC schema evolution E2E tests 
are frequently flaky in CI environments.
   The root cause is primarily timing issues where test assertions execute 
before the sink container (SQL Server/MySQL) is fully ready or before the 
internal SeaTunnel engine has stabilized.
   
   **Symptoms observed in CI logs:**
   1. `ConditionTimeout` in `assertSchemaEvolution`: The test waits for 
source/sink convergence but times out.
   2. `Status 409: Container is not running`: The engine container dies 
unexpectedly during the test, likely due to startup race conditions or resource 
constraints.
   3. Network failures during driver downloads: The `ContainerExtendedFactory` 
uses a single `wget` command which fails on transient network glitches.
   
   ### Motivation
   These flaky tests cause unrelated PRs to fail (false negatives), wasting CI 
resources and developer time requiring multiple re-runs. Improving the 
resiliency of these base tests will stabilize the build pipeline for the entire 
community.
   
   ### Proposed Changes
   I propose the following enhancements to `AbstractSchemaChangeBaseIT` and 
`ContainerExtendedFactory`:
   
   1. **Increase Timeouts**:
      - Update Awaitility timeouts in `assertSchemaEvolution` from `60s` to 
`180s` to account for slower CI runners.
   
   2. **Robust Wait Strategies**:
      - Add explicit `Wait.forListeningPort()` and `StartupTimeout` to sink 
containers in `initSinkContainer()`.
      - Implement a `waitForSinkDbReady()` helper to verify JDBC connectivity 
before running assertions.
   
   3. **Network Resiliency**:
      - Wrap the JDBC driver download (`wget`) in a retry loop inside 
`ContainerExtendedFactory` to handle transient network failures.
   
   4. **Diagnostics**:
      - Attach a `Slf4jLogConsumer` to the sink and engine containers to 
capture logs upon failure, aiding future debugging.
   
   ### Task List
   - [ ] Refactor `AbstractSchemaChangeBaseIT` to use longer timeouts.
   - [ ] Add retry logic for `wget` in test container setup.
   - [ ] Add health checks for Sink containers before assertion phases.
   - [ ] Verify stability by running `SqlServerSchemaChangeIT` locally.
   
   ### Usage Scenario
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to