vanshaj2023 commented on PR #49462:
URL: https://github.com/apache/arrow/pull/49462#issuecomment-4187882868

   Thanks for the review, @pitrou!
   
   **1. Why does this only appear in `arrow-json-test` and not other tests?**
   
   The crash surfaces in `ReaderTest.MultipleChunksParallel` because that test 
creates a **brand-new `ThreadPool`** and immediately dispatches work to it. The 
race window is extremely narrow: when `LaunchWorkersUnlocked` spawns a new 
thread, that thread calls `SetCurrentThreadPool(this)`, writing to a 
`thread_local` before MinGW's `__emutls` has finished initializing TLS for the 
new thread. This dereferences a stale/invalid pointer and segfaults.
   
   Other tests that use the global default thread pool (created at startup via 
`ThreadPool::MakeEternal`) don't hit this because the pool is already **warm** 
by the time those tests run - no new threads need to be spawned during that 
vulnerable window.
   
   A raw race reproduction is possible by calling `ThreadPool::Make(N)` in a 
tight loop on MinGW, or more reliably by running the test in a shell loop:
   ```sh
   while ./arrow-json-test --gtest_filter=ReaderTest.MultipleChunksParallel; do 
:; done
   ```
   The TlsPreservation test added in thread_pool_test.cc exercises the same 
code path (OwnsThisThread() → GetCurrentThreadPool() → TlsGetValue) directly 
from the ThreadPool tests.
   
   **2. Is there an upstream MinGW/GCC issue?**
   
   Yes, this is tracked as [GCC Bug 
#78605](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78605), which documents 
the __emutls race condition during thread startup. I've updated the code 
comment in thread_pool.cc to reference this upstream bug alongside the Arrow 
issue link.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to