marin-ma opened a new pull request, #5657:
URL: https://github.com/apache/incubator-gluten/pull/5657

   If setting `spark.gluten.sql.columnar.backend.velox.IOThreads` > 0, the 
async io is enabled. The program may crash as below
   
   ```
   (gdb) thread apply all bt
   
   Thread 2 (Thread 0x7fffebbff640 (LWP 1871752) "IOThreadPool0"):
   #0  0x00007ffff6eba637 in 
facebook::velox::process::ThreadLocalRegistry<facebook::velox::process::TraceHistory>::Reference::~Reference()
 () from /home/sparkuser/gluten/cpp/cmake-build-release/releases/libvelox.so
   #1  0x00007fffeea45d9f in __GI___call_tls_dtors () at 
./stdlib/cxa_thread_atexit_impl.c:159
   #2  0x00007fffeea94945 in start_thread (arg=<optimized out>) at 
./nptl/pthread_create.c:450
   #3  0x00007fffeeb26850 in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
   
   Thread 1 (Thread 0x7fffee9cf440 (LWP 1871731) "generic_benchma"):
   #0  __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, 
op=265, expected=1871752, futex_word=0x7fffebbff910) at 
./nptl/futex-internal.c:57
   #1  __futex_abstimed_wait_common (cancel=true, private=128, abstime=0x0, 
clockid=0, expected=1871752, futex_word=0x7fffebbff910) at 
./nptl/futex-internal.c:87
   #2  __GI___futex_abstimed_wait_cancelable64 
(futex_word=futex_word@entry=0x7fffebbff910, expected=1871752, 
clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128) 
at ./nptl/futex-internal.c:139
   #3  0x00007fffeea96624 in __pthread_clockjoin_ex (threadid=140737148614208, 
thread_return=0x0, clockid=0, abstime=0x0, block=<optimized out>) at 
./nptl/pthread_join_common.c:105
   #4  0x00007fffeeedc2c7 in std::thread::join() () from 
/lib/x86_64-linux-gnu/libstdc++.so.6
   #5  0x00007ffff6f4b610 in folly::ThreadPoolExecutor::joinStoppedThreads 
(this=0x555555617640, n=1) at /usr/include/c++/11/bits/shared_ptr_base.h:1295
   #6  0x00007ffff6f4b774 in folly::ThreadPoolExecutor::stopAndJoinAllThreads 
(this=0x555555617640, isJoin=<optimized out>) at 
/home/sparkuser/gluten-debug/ep/build-velox/build/velox_ep/folly/folly/executors/ThreadPoolExecutor.cpp:290
   #7  0x00007ffff6f3f56d in folly::IOThreadPoolExecutor::~IOThreadPoolExecutor 
(this=this@entry=0x555555617640, __in_chrg=<optimized out>, 
__vtt_parm=<optimized out>) at 
/home/sparkuser/gluten-debug/ep/build-velox/build/velox_ep/folly/folly/executors/IOThreadPoolExecutor.cpp:130
   #8  0x00007ffff6f3f6ad in folly::IOThreadPoolExecutor::~IOThreadPoolExecutor 
(this=0x555555617640, __in_chrg=<optimized out>, __vtt_parm=<optimized out>) at 
/home/sparkuser/gluten-debug/ep/build-velox/build/velox_ep/folly/folly/executors/IOThreadPoolExecutor.cpp:131
   #9  0x00007ffff3393275 in std::unique_ptr<gluten::VeloxBackend, 
std::default_delete<gluten::VeloxBackend> >::~unique_ptr() () from 
/home/sparkuser/gluten/cpp/cmake-build-release/releases/libvelox.so
   #10 0x00007fffeea45a56 in __cxa_finalize (d=0x7ffff7e7f680) at 
./stdlib/cxa_finalize.c:83
   #11 0x00007ffff33017d7 in __do_global_dtors_aux () from 
/home/sparkuser/gluten/cpp/cmake-build-release/releases/libvelox.so
   #12 0x00007fffffffd270 in ?? ()
   #13 0x00007ffff7fc924e in _dl_fini () at ./elf/dl-fini.c:142
   Backtrace stopped: frame did not save the PC
   ```
   
   Currently, the `ioExecutor_` in `VeloxBackend` is destructed in the dtor in 
main thread. However, destructing IOThreadPoolExecutor will stop and join all 
threads. On threads exit, thread local variables can be constructed with 
referencing global variables, but at which time the global variable had already 
been destructed. So, we need to destruct IOThreadPoolExecutor and stop the 
threads before global variables get destructed.
   
   e.g. In Velox's code 
https://github.com/facebookincubator/velox/blob/5c4903fe2c1f640094295141206f23fe9a54c218/velox/common/process/TraceContext.cpp#L27-L31
 thread local variable `threadLocalTraceData` can be constructed on thread exit 
in `VeloxBackend` dtor, but at this time `registry` has ready destructed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to