BrytonLee opened a new issue, #1957:
URL: https://github.com/apache/auron/issues/1957
**Describe the bug**
Executor coredumps and/or panics happen when running SQL like TPC-DS
Q75/TPCH Q17. Following are a few of error messages:
***Panics at SendError***
```
thread 'auron-native-stage-15-part-1-tid-119' panicked at
native-engine/auron/src/lib.rs:58:64:
called `Result::unwrap()` on an `Err` value: SendError { .. }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
26/01/23 10:17:13 ERROR SparkUncaughtExceptionHandler: Uncaught exception in
thread Thread[auron native task 1.0 in stage 15.0 (TID 119),5,main]
java.lang.RuntimeException: called `Result::unwrap()` on an `Err` value:
SendError { .. }
```
***Backtrace when panic***
```
26/01/22 07:30:48 INFO Executor: Running task 19.1 in stage 114.0 (TID 1069)
0: __rustc::rust_begin_unwind
at
/rustc/50aa04180709189a03dde5fd1c05751b2625ed37/library/std/src/panicking.rs:697:5
1: core::panicking::panic_fmt
at
/rustc/50aa04180709189a03dde5fd1c05751b2625ed37/library/core/src/panicking.rs:75:14
2: core::result::unwrap_failed
at
/rustc/50aa04180709189a03dde5fd1c05751b2625ed37/library/core/src/result.rs:1732:5
3: auron::handle_unwinded_scope
4: auron::rt::NativeExecutionRuntime::start::{{closure}}
5: tokio::runtime::task::core::Core<T,S>::poll
6: tokio::runtime::task::raw::poll
7: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
8: tokio::runtime::scheduler::multi_thread::worker::Context::run
9: tokio::runtime::context::scoped::Scoped<T>::set
10: tokio::runtime::context::runtime::enter_runtime
11: tokio::runtime::scheduler::multi_thread::worker::run
12: <tokio::runtime::blocking::task::BlockingTask<T> as
core::future::future::Future>::poll
13: tokio::runtime::task::core::Core<T,S>::poll
14: tokio::runtime::task::harness::Harness<T,S>::poll
15: tokio::runtime::blocking::pool::Inner::run
```
***Coredumps***
```
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007bb0b4aaa575, pid=1588146, tid=1877402
#
# JRE version: OpenJDK Runtime Environment (17.0.16+8) (build
17.0.16+8-Ubuntu-0ubuntu124.04.1)
# Java VM: OpenJDK 64-Bit Server VM (17.0.16+8-Ubuntu-0ubuntu124.04.1, mixed
mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc,
linux-amd64)
# Problematic frame:
# [2504.856s][info ][gc,start ] GC(692) Pause Young (Prepare Mixed)
(G1 Evacuation Pause)
[2504.856s][info ][gc,task ] GC(692) Using 43 workers of 43 for
evacuation
[2504.861s][info ][gc,phases ] GC(692) Pre Evacuate Collection Set:
0.2ms
[2504.861s][info ][gc,phases ] GC(692) Merge Heap Roots: 0.2ms
[2504.861s][info ][gc,phases ] GC(692) Evacuate Collection Set:
3.7ms
[2504.861s][info ][gc,phases ] GC(692) Post Evacuate Collection
Set: 1.1ms
[2504.861s][info ][gc,phases ] GC(692) Other: 0.2ms
[2504.861s][info ][gc,heap ] GC(692) Eden regions: 270->0(18)
[2504.861s][info ][gc,heap ] GC(692) Survivor regions: 4->4(35)
[2504.861s][info ][gc,heap ] GC(692) Old regions: 69->69
[2504.861s][info ][gc,heap ] GC(692) Archive regions: 2->2
[2504.861s][info ][gc,heap ] GC(692) Humongous regions: 40->40
[2504.861s][info ][gc,metaspace ] GC(692) Metaspace:
106036K(107648K)->106036K(107648K) NonClass: 93465K(94336K)->93465K(94336K)
Class: 12571K(13312K)->12571K(13312K)
[2504.861s][info ][gc ] GC(692) Pause Young (Prepare Mixed)
(G1 Evacuation Pause) 758M->218M(894M) 5.411ms
[2504.861s][info ][gc,cpu ] GC(692) User=0.08s Sys=0.01s Real=0.00s
C [libauron-4547940331120690501.tmp+0x16aa575][thread 1877388 also had an
error]
datafusion_ext_commons::arrow::eq_comparator::EqComparator::eq::hffa5a7c62813e2e3+0x35
#
# Core dump will be written. Default location:
/var/coredumps/core.%e.1588146.%t
#
# An error report file with more information is saved as:
#
/tmp/hadoop-saying/nm-local-dir/usercache/saying/appcache/application_1765502793146_0014/container_1765502793146_0014_01_000004/hs_err_pid1588146.log
#
# If you would like to submit a bug report, please visit:
# https://bugs.launchpad.net/ubuntu/+source/openjdk-17
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
```
**To Reproduce**
There is a high possibility to reproduce this bug by running TPC-DS Q95 or
TPCH Q17
**Additional context**
- Coredump SIGILL is not due to cross platform compatible issue, Rust lang
implements panic with`ud2` (undefined instrustion) to terminate program.
- We are working on this issue, please contact us if you'd like to help.
Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]