acelyc111 commented on PR #2403: URL: https://github.com/apache/incubator-pegasus/pull/2403#issuecomment-4604159564
Pushed `533f432ac` (`ci(github): set workflow default shell to bash`) to investigate the ASAN test failures seen on the previous head `f43f08c`. ## What I observed on the previous attempt All 31 `Test ASAN (...)` jobs failed at the `Unit Testing` step with: ``` ====================== run <module> ========================== zookeeper-bin cannot be found under .../.zk_install, thus try to find an existing one zookeeper-bin is found under current work dir ... ZooKeeper JMX enabled by default Using config: .../.zk_install/zookeeper-bin/bin/../conf/zoo.cfg Starting zookeeper ... FAILED TO START ##[error]Process completed with exit code 1. ``` So every ASAN test job actually got past download/build artifact, started the test runner, and then died because **`zkServer.sh start` could not bring up the embedded ZooKeeper**. ZK fails ~1 second after `Using config: ...`, which is consistent with the JVM exiting before the readiness check loop. ## Why this is surfacing on this PR Comparing GitHub Actions step config between this PR and the last historically-successful Cpp CI run on a v2.5-base PR (`limowang/fix/disk_abnormal` commit `3904180c`, run [`24069709672`](https://github.com/apache/incubator-pegasus/actions/runs/24069709672)): | | This PR (failing) | limowang's PR (success) | |---|---|---| | `shell:` for the test step | `sh -e {0}` | `bash --noprofile --norc -e -o pipefail {0}` | | `defaults.run.shell` in workflow | (absent) | `bash` | `master`'s `lint_and_test_cpp.yaml` already declares `defaults.run.shell: bash`. v2.5's copy never picked that up, so every step on v2.5 runs under `sh` (dash on Ubuntu). This wasn't visible until now because v2.5 PRs were blocked by the ASF allow-list before any job ran (this PR removes that block). ## What `533f432ac` does Adds `defaults.run.shell: bash` at the top of `.github/workflows/lint_and_test_cpp.yaml`, mirroring master. ## Caveat Whether `defaults.run.shell: bash` alone fully fixes ZK startup is **not yet proven** — ZK's actual stderr lives in `zookeeper.out` inside `dataDir`, which isn't captured by GitHub Actions logs, so the 1-second JVM-level exit could in principle have a separate root cause. The new run for `533f432ac` is just queued at the time of writing; once it reaches the `Test ASAN` matrix we'll know. If `Test ASAN` still fails the same way under bash, the next step is to capture `zookeeper.out` by adding a `cat .zk_install/zookeeper-bin/logs/zookeeper.out || true` line to the test step on failure, similar to how master's workflow handles diagnostics. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
