BrytonLee opened a new issue, #1957:
URL: https://github.com/apache/auron/issues/1957

   **Describe the bug**
   Executor coredumps and/or panics happen when running SQL like TPC-DS 
Q75/TPCH Q17.  Following are a few of  error messages:
   
   ***Panics at SendError***
   ```
   thread 'auron-native-stage-15-part-1-tid-119' panicked at 
native-engine/auron/src/lib.rs:58:64:
   called `Result::unwrap()` on an `Err` value: SendError { .. }
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   26/01/23 10:17:13 ERROR SparkUncaughtExceptionHandler: Uncaught exception in 
thread Thread[auron native task 1.0 in stage 15.0 (TID 119),5,main]
   java.lang.RuntimeException: called `Result::unwrap()` on an `Err` value: 
SendError { .. }
   ```
   ***Backtrace when panic***
   ```
   26/01/22 07:30:48 INFO Executor: Running task 19.1 in stage 114.0 (TID 1069)
      0: __rustc::rust_begin_unwind
                at 
/rustc/50aa04180709189a03dde5fd1c05751b2625ed37/library/std/src/panicking.rs:697:5
      1: core::panicking::panic_fmt
                at 
/rustc/50aa04180709189a03dde5fd1c05751b2625ed37/library/core/src/panicking.rs:75:14
      2: core::result::unwrap_failed
                at 
/rustc/50aa04180709189a03dde5fd1c05751b2625ed37/library/core/src/result.rs:1732:5
      3: auron::handle_unwinded_scope
      4: auron::rt::NativeExecutionRuntime::start::{{closure}}
      5: tokio::runtime::task::core::Core<T,S>::poll
      6: tokio::runtime::task::raw::poll
      7: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
      8: tokio::runtime::scheduler::multi_thread::worker::Context::run
      9: tokio::runtime::context::scoped::Scoped<T>::set
     10: tokio::runtime::context::runtime::enter_runtime
     11: tokio::runtime::scheduler::multi_thread::worker::run
     12: <tokio::runtime::blocking::task::BlockingTask<T> as 
core::future::future::Future>::poll
     13: tokio::runtime::task::core::Core<T,S>::poll
     14: tokio::runtime::task::harness::Harness<T,S>::poll
     15: tokio::runtime::blocking::pool::Inner::run
   ```
   
   ***Coredumps***
   ```
   #
   # A fatal error has been detected by the Java Runtime Environment:
   #
   #  SIGSEGV (0xb) at pc=0x00007bb0b4aaa575, pid=1588146, tid=1877402
   #
   # JRE version: OpenJDK Runtime Environment (17.0.16+8) (build 
17.0.16+8-Ubuntu-0ubuntu124.04.1)
   # Java VM: OpenJDK 64-Bit Server VM (17.0.16+8-Ubuntu-0ubuntu124.04.1, mixed 
mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, 
linux-amd64)
   # Problematic frame:
   # [2504.856s][info   ][gc,start       ] GC(692) Pause Young (Prepare Mixed) 
(G1 Evacuation Pause)
   [2504.856s][info   ][gc,task        ] GC(692) Using 43 workers of 43 for 
evacuation
   [2504.861s][info   ][gc,phases      ] GC(692)   Pre Evacuate Collection Set: 
0.2ms
   [2504.861s][info   ][gc,phases      ] GC(692)   Merge Heap Roots: 0.2ms
   [2504.861s][info   ][gc,phases      ] GC(692)   Evacuate Collection Set: 
3.7ms
   [2504.861s][info   ][gc,phases      ] GC(692)   Post Evacuate Collection 
Set: 1.1ms
   [2504.861s][info   ][gc,phases      ] GC(692)   Other: 0.2ms
   [2504.861s][info   ][gc,heap        ] GC(692) Eden regions: 270->0(18)
   [2504.861s][info   ][gc,heap        ] GC(692) Survivor regions: 4->4(35)
   [2504.861s][info   ][gc,heap        ] GC(692) Old regions: 69->69
   [2504.861s][info   ][gc,heap        ] GC(692) Archive regions: 2->2
   [2504.861s][info   ][gc,heap        ] GC(692) Humongous regions: 40->40
   [2504.861s][info   ][gc,metaspace   ] GC(692) Metaspace: 
106036K(107648K)->106036K(107648K) NonClass: 93465K(94336K)->93465K(94336K) 
Class: 12571K(13312K)->12571K(13312K)
   [2504.861s][info   ][gc             ] GC(692) Pause Young (Prepare Mixed) 
(G1 Evacuation Pause) 758M->218M(894M) 5.411ms
   [2504.861s][info   ][gc,cpu         ] GC(692) User=0.08s Sys=0.01s Real=0.00s
   C  [libauron-4547940331120690501.tmp+0x16aa575][thread 1877388 also had an 
error]
     
datafusion_ext_commons::arrow::eq_comparator::EqComparator::eq::hffa5a7c62813e2e3+0x35
   #
   # Core dump will be written. Default location: 
/var/coredumps/core.%e.1588146.%t
   #
   # An error report file with more information is saved as:
   # 
/tmp/hadoop-saying/nm-local-dir/usercache/saying/appcache/application_1765502793146_0014/container_1765502793146_0014_01_000004/hs_err_pid1588146.log
   #
   # If you would like to submit a bug report, please visit:
   #   https://bugs.launchpad.net/ubuntu/+source/openjdk-17
   # The crash happened outside the Java Virtual Machine in native code.
   # See problematic frame for where to report the bug.
   ```
   
   **To Reproduce**
   There is a high possibility to reproduce this bug by running TPC-DS Q95 or 
TPCH Q17
   
   
   **Additional context**
   - Coredump SIGILL is not due to cross platform compatible issue, Rust lang 
implements panic with`ud2` (undefined instrustion) to terminate program.
   - We are working on this issue, please contact us if you'd like to help. 
Thanks. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to