gabotechs opened a new issue, #1836: URL: https://github.com/apache/datafusion-ballista/issues/1836
**Environment:** Ballista 53.0.0, DataFusion 53.1.0, 1 coordinator + 11 workers (`c5n.2xlarge`: 8 vCPU, 20.5 GB RAM) ## What happened The coordinator node crashed with no graceful shutdown while executing TPCH SF100 Q5. Queries Q1–Q4 completed successfully. The client received a connection reset mid-query and the machine became unreachable until rebooted. Memory rose from 9.5% to 70.7% (14.9 GB of 20.5 GB) in the 30 minutes preceding the crash, as measured by `sysstat`. No kernel OOM logs survived the reboot (no kdump configured). ## How to reproduce Requires the benchmark infrastructure from https://github.com/datafusion-contrib/datafusion-distributed (`benchmarks/cdk`), which provisions an AWS cluster of `c5n.2xlarge` instances and runs Ballista as one of the benchmark engines. **Warning:** this requires an AWS account. Running 12 `c5n.2xlarge` instances continuously costs roughly $15–20/hour; remember to tear the cluster down when done. Follow the deploy and port-forward instructions in [`benchmarks/cdk/README.md`](https://github.com/datafusion-contrib/datafusion-distributed/blob/main/benchmarks/cdk/README.md). Port-forward on port `9002` (Ballista HTTP) instead of `9000`. Then run: ```bash npm run ballista-bench -- --dataset tpch_sf100 ``` Additionally, that project contains a Claude SKILL that performs the provisioning and benchmarking automatic. The crash consistently occurs during Q5 (the first query to perform a full scan + sort-shuffle of `lineitem` at SF100 scale: 600 M rows, 144 spill events, 366 s shuffle write time per stage). Note: requires the following benchmark housekeeping PR to be merged in `datafusion-distributed`: - https://github.com/datafusion-contrib/datafusion-distributed/pull/485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
