gabotechs opened a new issue, #1836:
URL: https://github.com/apache/datafusion-ballista/issues/1836

   **Environment:** Ballista 53.0.0, DataFusion 53.1.0, 1 coordinator + 11 
workers (`c5n.2xlarge`: 8 vCPU, 20.5 GB RAM)
   
   ## What happened
   
   The coordinator node crashed with no graceful shutdown while executing TPCH 
SF100 Q5. Queries Q1–Q4 completed successfully. The client received a 
connection reset mid-query and the machine became unreachable until rebooted.
   
   Memory rose from 9.5% to 70.7% (14.9 GB of 20.5 GB) in the 30 minutes 
preceding the crash, as measured by `sysstat`. No kernel OOM logs survived the 
reboot (no kdump configured).
   
   ## How to reproduce
   
   Requires the benchmark infrastructure from 
https://github.com/datafusion-contrib/datafusion-distributed 
(`benchmarks/cdk`), which provisions an AWS cluster of `c5n.2xlarge` instances 
and runs Ballista as one of the benchmark engines.
   
   **Warning:** this requires an AWS account. Running 12 `c5n.2xlarge` 
instances continuously costs roughly $15–20/hour; remember to tear the cluster 
down when done.
   
   Follow the deploy and port-forward instructions in 
[`benchmarks/cdk/README.md`](https://github.com/datafusion-contrib/datafusion-distributed/blob/main/benchmarks/cdk/README.md).
 Port-forward on port `9002` (Ballista HTTP) instead of `9000`. Then run:
   
   ```bash
   npm run ballista-bench -- --dataset tpch_sf100
   ```
   
   Additionally, that project contains a Claude SKILL that performs the 
provisioning and benchmarking automatic.
   
   The crash consistently occurs during Q5 (the first query to perform a full 
scan + sort-shuffle of `lineitem` at SF100 scale: 600 M rows, 144 spill events, 
366 s shuffle write time per stage).
   
   Note: requires the following benchmark housekeeping PR to be merged in 
`datafusion-distributed`:
   - https://github.com/datafusion-contrib/datafusion-distributed/pull/485


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to