hello-stephen opened a new pull request, #61543: URL: https://github.com/apache/doris/pull/61543
…ging indefinitely Problem: FE UT occasionally hangs when a test's forked JVM process gets stuck (e.g. a mocked FE background thread that never exits, or a deadlock during test initialization). When this happens, the entire build silently stalls until the CI global timeout is hit (3h+), causing all subsequent tests in the same Surefire batch to never run. Two concrete examples identified from CI logs: org.apache.doris.statistics.AnalysisTaskExecutorTest — mocked FE started but JVM never exited, hung for ~1h39m org.apache.doris.planner.FederationBackendPolicyTest — zero output after "Running", hung silently Solution: Add -Dsurefire.forkedProcessTimeoutInSeconds and -Dsurefire.forkedProcessExitTimeoutInSeconds to all mvn test invocations in run-fe-ut.sh. When a forked JVM exceeds the timeout with no progress: Surefire sends SIGQUIT → the JVM prints a full thread dump to stdout (visible in CI logs), making the root cause immediately diagnosable Surefire then kills the process and marks the test as ERROR The rest of the test suite continues normally The timeout values are configurable via env vars FE_UT_FORK_TIMEOUT (default: 600s) and FE_UT_FORK_EXIT_TIMEOUT (default: 60s), so CI pipelines can override them without touching the script. Impact: No change to normal test execution Hung tests now fail fast with a thread dump instead of silently consuming the entire CI budget Overall FE UT wall-clock time reduced from 3h+ (timeout-killed) back to the expected ~1.5h -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
