guowangy opened a new issue, #11895:
URL: https://github.com/apache/gluten/issues/11895

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   When running TPC-DS or heavy scan workloads on HDFS with `IOThreads > 0` and 
`SplitPreloadPerDriver > 0`, the JVM process has a change to crash with SIGSEGV 
inside `jni_NewStringUTF` during `hdfsGetPathInfo()`. The crashing thread is a 
`CPUThreadPoolN` thread used for async split preloading.
   
   **Expected behavior**: HDFS file operations should work reliably on 
IOThreadPool threads across consecutive preload tasks.
   
   **Actual behavior**: After a certain number of tasks, the IOThreadPool 
thread crashes with SIGSEGV when calling `hdfsGetPathInfo()` via `libhdfs.so`.
   
   ## Root cause
   
   `libhdfs.so` caches `JNIEnv*` in an ELF thread-local (`__thread`) variable 
after the first `AttachCurrentThread` on each thread. The cached env is 
returned on all subsequent calls without re-validation (confirmed by 
disassembly of `libhdfs.so`'s `getJNIEnv` function).
   
   Gluten's `JniColumnarBatchIterator::~JniColumnarBatchIterator()` 
(`JniCommon.cc`) and `JavaInputStreamAdaptor::Close()` (`JniWrapper.cc`) call 
`vm_->DetachCurrentThread()` after JNI cleanup. This invalidates the `JNIEnv*` 
and frees the backing `JavaThread` object in the JVM. But libhdfs's TLS cache 
still holds the old pointer. On the next HDFS call, `libhdfs`'s `getJNIEnv()` 
returns the stale pointer, and the JVM crashes when it tries to transition the 
freed thread state.
   
   ### Detailed mechanism
   
   **libhdfs `getJNIEnv` fast path** (from disassembly):
   ```
   1. __tls_get_addr() → get &(__thread hdfsTls*)
   2. if (tls_ptr != NULL) → return tls_ptr->env    // NO RE-VALIDATION
   3. else → slow path: AttachCurrentThread, cache env
   ```
   
   **After `DetachCurrentThread`**:
   - JVM frees the `JavaThread` object, reclaims the memory at the env address
   - libhdfs `__thread` TLS still holds the stale `hdfsTls*` → stale `env`
   - Next HDFS call → `getJNIEnv()` fast path returns stale env
   - `jni_NewStringUTF(stale_env, ...)` → computes `JavaThread* = env - 0x200` 
→ freed memory
   - JVM reads `*(JavaThread + 0x290)` — gets garbage (not the magic alive 
marker `0xdeab`)
   - JVM calls `block_if_vm_exited()`, sets JavaThread\* = NULL
   - `transition_from_native(NULL, ...)` → **SIGSEGV** at address 0x278
   
   ### Evidence from core dump
   
   Core dump: `core.CPUThreadPool21.1770392` (from TPC-DS benchmark on YARN)
   
   Registers at crash frame (`ThreadStateTransition::transition_from_native`):
   ```
   RDI = 0x0                    ← JavaThread* is NULL (set by 
block_if_vm_exited)
   R12 = 0x7f3003a52200         ← stale JNIEnv* from libhdfs TLS cache
   ```
   
   Memory at stale env (`0x7f3003a52200`):
   ```
   0x7f3003a52200: 0x0000000000000000  0x0000000000000000   ← JNI function 
table is NULL
   0x7f3003a52210: 0x0000001200000112  0x0000000000000000   ← JVM method 
resolution data (reused memory)
   ```
   
   Call chain (resolved from `libvelox.so` symbol table via `nm`):
   ```
   CPUThreadPool21 (preload task)
     → SplitReader::createReader()          [libvelox.so + 0x6173914]
       → HdfsFileSystem::openFileForRead()  [libvelox.so + 0x3787216]
         → HdfsReadFile::HdfsReadFile()     [libvelox.so + 0x378AB36, 
constructor]
           → driver_->GetPathInfo()
             → hdfsGetPathInfo()            [libhdfs.so]
               → getJNIEnv() → returns stale env
                 → jni_NewStringUTF(stale_env, path) → SIGSEGV
   ```
   
   ### How DetachCurrentThread gets called on CPUThreadPool threads
   
   The two call sites:
   1. `JniColumnarBatchIterator::~JniColumnarBatchIterator()` — 
`cpp/core/jni/JniCommon.cc`
   2. `JavaInputStreamAdaptor::Close()` — `cpp/core/jni/JniWrapper.cc`
   
   These objects are held via `shared_ptr` chains rooted in the Velox `Task`. 
When a task is terminated (e.g., by memory arbitration or 
`WholeStageResultIterator::~WholeStageResultIterator()` calling 
`task_->requestCancel()`), `Task::terminate()` calls `driver->closeByTask()` → 
`closeOperators()` which destroys `DataSource` objects, dropping the last 
`shared_ptr` references. If this cleanup runs on a CPUThreadPool thread (e.g., 
triggered by memory pressure callback during a preload task), the destructor 
calls `DetachCurrentThread` on that thread.
   
   Sequence:
   1. CPUThreadPool21 runs preload task A → libhdfs attaches thread, caches env 
in TLS
   2. Object cleanup on the same thread → destructor calls 
`DetachCurrentThread` → env invalidated, but libhdfs TLS still holds it
   3. CPUThreadPool21 runs preload task B → `hdfsGetPathInfo()` → stale env → 
**SIGSEGV**
   
   
   ### Gluten version
   
   main branch
   
   ### Spark version
   
   Spark-3.5.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   core dump back trace:
   
   
   Core: core.CPUThreadPool21.1770392
   #10 ThreadStateTransition::transition_from_native(JavaThread*, 
JavaThreadState)  [libjvm.so]
       RDI=0x0 (NULL JavaThread*), R12=0x7f3003a52200 (stale JNIEnv*)
   #11 jni_NewStringUTF                                                         
    [libjvm.so]
   #12 newJavaStr (env=0x7f3003a52200, path="/.../catalog_sales/...parquet")    
    [libhdfs.so]
   #13 constructNewObjectOfPath                                                 
    [libhdfs.so]
   #14 hdfsGetPathInfo                                                          
    [libhdfs.so]
   #15 HdfsReadFile::HdfsReadFile()                                             
    [libvelox.so + 0x378AB36]
   #16 HdfsFileSystem::openFileForRead()                                        
    [libvelox.so + 0x3787216]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to