milenkovicm opened a new issue, #1795: URL: https://github.com/apache/datafusion-ballista/issues/1795
**Describe the bug** In case of job failure job gets proper status "Failed" <img width="1146" height="105" alt="Image" src="https://github.com/user-attachments/assets/7ab6db4e-9211-481b-8dbf-a97b7077e12d" /> but underlying stages and task get stuck with status running: <img width="1042" height="101" alt="Image" src="https://github.com/user-attachments/assets/c85b70c8-2235-4127-81cc-0cebdb84b296" /> <img width="1167" height="134" alt="Image" src="https://github.com/user-attachments/assets/6c0b4312-d2ed-4d44-a8e3-ba1f8ea97eb4" /> **To Reproduce** Steps to reproduce the behavior: running tpch benchmark with failure generator enabled ``` cargo run --bin tpch -- benchmark ballista -p /Users/marko/TMP/tpch_data/tpch-data-sf10/ -f parquet -i 1 --port 50050 --host 127.0.0.1 -c datafusion.execution.target_partitions=24 -c ballista.planner.adaptive.enabled=true -c ballista.testing.chaos_execution.enabled=true -c ballista.testing.chaos_execution.fault_type=fatal -c ballista.testing.chaos_execution.probability=0.35 -q 1 ``` should trigger job failure **Expected behavior** When job is cancelled or failed, running stages and tasks should be canceled and marked as cancelled. It has to be confirmed if job failure actually issues "cancel" stages **Additional context** #1793 & #1789 to be merged beforehand as they would simplify testing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
