Suresh-Krishna-Kusuma opened a new issue, #10457: URL: https://github.com/apache/seatunnel/issues/10457
### Search before asking - [x] I had searched in the [feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description ### Description The `SqlServerSchemaChangeIT` and related JDBC schema evolution E2E tests are frequently flaky in CI environments. The root cause is primarily timing issues where test assertions execute before the sink container (SQL Server/MySQL) is fully ready or before the internal SeaTunnel engine has stabilized. **Symptoms observed in CI logs:** 1. `ConditionTimeout` in `assertSchemaEvolution`: The test waits for source/sink convergence but times out. 2. `Status 409: Container is not running`: The engine container dies unexpectedly during the test, likely due to startup race conditions or resource constraints. 3. Network failures during driver downloads: The `ContainerExtendedFactory` uses a single `wget` command which fails on transient network glitches. ### Motivation These flaky tests cause unrelated PRs to fail (false negatives), wasting CI resources and developer time requiring multiple re-runs. Improving the resiliency of these base tests will stabilize the build pipeline for the entire community. ### Proposed Changes I propose the following enhancements to `AbstractSchemaChangeBaseIT` and `ContainerExtendedFactory`: 1. **Increase Timeouts**: - Update Awaitility timeouts in `assertSchemaEvolution` from `60s` to `180s` to account for slower CI runners. 2. **Robust Wait Strategies**: - Add explicit `Wait.forListeningPort()` and `StartupTimeout` to sink containers in `initSinkContainer()`. - Implement a `waitForSinkDbReady()` helper to verify JDBC connectivity before running assertions. 3. **Network Resiliency**: - Wrap the JDBC driver download (`wget`) in a retry loop inside `ContainerExtendedFactory` to handle transient network failures. 4. **Diagnostics**: - Attach a `Slf4jLogConsumer` to the sink and engine containers to capture logs upon failure, aiding future debugging. ### Task List - [ ] Refactor `AbstractSchemaChangeBaseIT` to use longer timeouts. - [ ] Add retry logic for `wget` in test container setup. - [ ] Add health checks for Sink containers before assertion phases. - [ ] Verify stability by running `SqlServerSchemaChangeIT` locally. ### Usage Scenario _No response_ ### Related issues _No response_ ### Are you willing to submit a PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
