Ma77Ball opened a new issue, #5727: URL: https://github.com/apache/texera/issues/5727
### Summary `PostgreSQLConnUtilSpec` (added in #5698 / commit `83df2b513`) is **flaky on `main`** and intermittently fails the `build / amber (ubuntu-22.04, 17)` job. It is a test-only bug: the production code (`PostgreSQLConnUtil.connect`) is correct and unchanged. Its twin `MySQLConnUtilSpec` shares the same pattern and is a latent flake as well. ### Evidence (same code, non-deterministic result) Three consecutive `main` pushes, all of which compile and run the `WorkflowOperator` tests against identical test code: | Run | Commit | `PostgreSQLConnUtilSpec` | | --- | --- | --- | | [27522840785](https://github.com/apache/texera/actions/runs/27522840785) | `83df2b513` (added the test) | fail | | 27529145479 | `a09002912` | pass | | [27529156456](https://github.com/apache/texera/actions/runs/27529156456) | `891d2adbc` | fail | pass, then fail, then fail on identical code is the signature of a race, not a deterministic failure. ### Root cause The spec mutates **process-global JVM state**: the `java.sql.DriverManager` driver registry. In `beforeAll` it deregisters every real `jdbc:postgresql:` driver and registers a capturing stub; `afterAll` restores them; the "propagate SQLException" case swaps in another throwing driver mid-test. Meanwhile: - The CI step runs `WorkflowOperator/jacoco` in a **single JVM** (no `fork`). - There is no `Test / parallelExecution := false` anywhere in `build.sbt` or `project/`, so sbt's default (`parallelExecution = true`) applies, meaning **ScalaTest suites in the module run concurrently in that one JVM**. So while `PostgreSQLConnUtilSpec` has the global registry torn down/swapped, other concurrently-running suites in the module touch JDBC (the module uses embedded Postgres via `DriverManager`) and the real PostgreSQL driver self-registers/coexists during the window. The most likely concrete break is the `"propagate SQLException"` case: with the real driver also matching `jdbc:postgresql:`, `DriverManager.getConnection` can surface the real driver's *"The connection attempt failed"* `SQLException` instead of the test's `"forced-fail-for-test"`, failing `ex.getMessage.contains("forced-fail-for-test")`. The URL-capture assertions are vulnerable to the same interference. `DriverManager`'s individual methods are synchronized, but the suite's multi-step deregister, register, connect, restore sequence is **not atomic** against other threads. ### Proposed fix Remove the dependency on global `DriverManager` mutation. Extract the pure URL-composition logic (the only application-controlled behavior) into a testable function and assert on it directly, eliminating the shared-state hazard entirely. Apply the same treatment to `MySQLConnUtilSpec`. ### Task Type - [ ] Refactor / Cleanup - [x] DevOps / Deployment / CI - [x] Testing / QA - [ ] Documentation - [ ] Performance - [ ] Other -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
