slfan1989 opened a new issue, #2022:
URL: https://github.com/apache/auron/issues/2022

   ### Description
   Running TPC-DS queries (q60–q69) with Spark 4.0 / JDK 21 triggers a native 
engine panic when `WrappedSender::send` attempts to send data into a closed 
channel. This panic escalates into a JVM crash with exit code 134.
   
   ### Environment
   - **Spark Version**: 4.0
   - **Scala Version**: 2.13
   - **JDK Version**: 21 (Temurin 21.0.10+7)
   - **Workload**: TPC-DS q60–q69
   - **Date**: 2026-02-18
   
   ### Observed Behavior
   Multiple threads panic at:
   ```
   2026-02-18T06:17:59.2883677Z thread 'auron-native-stage-112-part-2-tid-136' 
panicked at 
native-engine/datafusion-ext-plans/src/common/execution_context.rs:723:35:
   2026-02-18T06:17:59.2884876Z output_with_sender: send error: channel closed
   2026-02-18T06:17:59.2885627Z stack backtrace:
   2026-02-18T06:17:59.2886022Z    0: __rustc::rust_begin_unwind
   2026-02-18T06:17:59.2886773Z              at 
/rustc/50aa04180709189a03dde5fd1c05751b2625ed37/library/std/src/panicking.rs:697:5
   2026-02-18T06:17:59.2887628Z    1: core::panicking::panic_fmt
   2026-02-18T06:17:59.2888362Z              at 
/rustc/50aa04180709189a03dde5fd1c05751b2625ed37/library/core/src/panicking.rs:75:14
   2026-02-18T06:17:59.2889530Z    2: 
datafusion_ext_plans::common::execution_context::WrappedSender<T>::send::{{closure}}::{{closure}}
   2026-02-18T06:17:59.2890682Z    3: 
datafusion_ext_plans::common::execution_context::WrappedSender<T>::send::{{closure}}
   2026-02-18T06:17:59.2894606Z    4: 
datafusion_ext_plans::sort_exec::send_output_batch::{{closure}}
   2026-02-18T06:17:59.2895716Z    5: 
<core::panic::unwind_safe::AssertUnwindSafe<F> as 
core::future::future::Future>::poll
   2026-02-18T06:17:59.2907421Z    6: 
datafusion_ext_plans::common::execution_context::ExecutionContext::output_with_sender_impl::{{closure}}
   2026-02-18T06:17:59.2908462Z    7: 
tokio::runtime::task::core::Core<T,S>::poll
   2026-02-18T06:17:59.2909093Z    8: tokio::runtime::task::harness::poll_future
   2026-02-18T06:17:59.2910064Z    9: 
tokio::runtime::task::harness::Harness<T,S>::poll_inner
   2026-02-18T06:17:59.2910769Z   10: 
tokio::runtime::task::harness::Harness<T,S>::poll
   2026-02-18T06:17:59.2911509Z   11: 
tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   2026-02-18T06:17:59.2912330Z   12: 
tokio::runtime::scheduler::multi_thread::worker::Context::run
   2026-02-18T06:17:59.2913064Z   13: 
tokio::runtime::context::scoped::Scoped<T>::set
   2026-02-18T06:17:59.2913652Z   14: tokio::runtime::context::set_scheduler
   2026-02-18T06:17:59.2914722Z   15: 
tokio::runtime::context::runtime::enter_runtime
   2026-02-18T06:17:59.2920955Z   16: 
tokio::runtime::scheduler::multi_thread::worker::run
   2026-02-18T06:17:59.2923760Z   17: 
<tokio::runtime::blocking::task::BlockingTask<T> as 
core::future::future::Future>::poll
   2026-02-18T06:17:59.2925198Z   18: 
tokio::runtime::task::core::Core<T,S>::poll
   2026-02-18T06:17:59.2925774Z   19: tokio::runtime::task::harness::poll_future
   2026-02-18T06:17:59.2926385Z   20: 
tokio::runtime::task::harness::Harness<T,S>::poll_inner
   2026-02-18T06:17:59.2927010Z   21: tokio::runtime::task::raw::poll
   2026-02-18T06:17:59.2927558Z   22: tokio::runtime::blocking::pool::Inner::run
   2026-02-18T06:17:59.2928362Z note: Some details are omitted, run with 
`RUST_BACKTRACE=full` for a verbose backtrace.
   ```
   
   ### Expected Behavior
   When the downstream channel is closed (due to task cancellation or early 
completion), the native engine should gracefully drop the batch without 
panicking or crashing the JVM.
   
   ### Root Cause
   Race condition: producer task continues sending data while the receiver has 
already been closed/canceled. The current implementation panics on send failure 
instead of handling it gracefully.
   
   ### Solution
   Modified `WrappedSender::send` in `execution_context.rs` to gracefully 
handle channel closure:
   - Check `send().await.is_err()` instead of panicking
   - Log debug message with context (partition_id, task_id, session_id)
   - Return early without updating metrics when channel is closed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to