Inokinoki commented on issue #40088:
URL: https://github.com/apache/arrow/issues/40088#issuecomment-2865062539

   > Thanks for the update! Yes that's also my doubt.
   > 
   > 
   > I will keep that in mind and try to continue debugging to figure out what
   > is the case.
   > […](#)
   
   I further investigated it. And the current conclusion is similar, it's the 2 
different versions of `google::protobuf::internal::OnShutdownRun` which causes 
the issue.
   
   But they are good in the two libraries, because eventually they are loaded 
implicitly using `RTLD_LOCAL` as you said.
   
   I think that it's because of the singleton data they are referring - 
`google::protobuf::internal::ShutdownData::get()::data`, which is created only 
once on demand:
   
   ```
   auto shutdown_data = ShutdownData::get();
   ```
   
   This data is expected to be different in the 2 versions (one with `absl` 
mutex as a member and another with `std::mutex` as a member).
   
   I'm not sure whether it's an issue from macOS or not, but it seems that the 
"local" variable is shared between the versions, which caused the issue.
   
   ----------------------------------
   
   Here are the debugging info:
   
   So I set breakpoint on `google::protobuf::internal::OnShutdownRun` and then 
import `pyarrow` first. The assembly of it in `libarrow` is as follows:
   
   ```
   * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2
       frame #0: 0x000000010732a0f4 
libarrow.2000.dylib`google::protobuf::internal::OnShutdownRun(void (*)(void 
const*), void const*)
   libarrow.2000.dylib`google::protobuf::internal::OnShutdownRun:
       0x10732a0f4 <+0>:  stp    x26, x25, [sp, #-0x50]!
       0x10732a0f8 <+4>:  stp    x24, x23, [sp, #0x10]
       0x10732a0fc <+8>:  stp    x22, x21, [sp, #0x20]
       0x10732a100 <+12>: stp    x20, x19, [sp, #0x30]
       0x10732a104 <+16>: stp    x29, x30, [sp, #0x40]
       0x10732a108 <+20>: add    x29, sp, #0x40
       0x10732a10c <+24>: mov    x20, x1
       0x10732a110 <+28>: mov    x21, x0
       0x10732a114 <+32>: adrp   x8, 1794
   ->  0x10732a118 <+36>: ldr    x8, [x8, #0x340]
       0x10732a11c <+40>: ldaprb w8, [x8]
       0x10732a120 <+44>: adrp   x19, 1796
       0x10732a124 <+48>: ldr    x19, [x19, #0x50]
       0x10732a128 <+52>: tbz    w8, #0x0, 0x10732a238 ; <+324>
       0x10732a12c <+56>: ldr    x22, [x19]
       0x10732a130 <+60>: add    x19, x22, #0x18
       0x10732a134 <+64>: mov    x0, x19
       0x10732a138 <+68>: bl     0x107531dd0    ; symbol stub for: 
std::__1::mutex::lock()
       0x10732a13c <+72>: ldp    x23, x8, [x22, #0x8]
       0x10732a140 <+76>: cmp    x23, x8
       0x10732a140 <+76>: cmp    x23, x8
       0x10732a144 <+80>: b.hs   0x10732a158    ; <+100>
       0x10732a148 <+84>: stp    x21, x20, [x23]
       0x10732a14c <+88>: add    x8, x23, #0x10 
   ```
   
   where the marked instruction is to get the singleton data. When I get the 
register bank, it contains:
   
   ```
   x8 = 0x0000000107b2b8c8  guard variable for 
google::protobuf::internal::ShutdownData::get()::data
   ```
   
   We can notice that `0x10732a138 <+68>` contains a direct call to standard 
C++ lib of mutex lock. And it's ok here because the singleton data also 
contains a `std::mutex` member.
   
   I let it continue running and then import `tink` (with a newer version using 
`absl` mutex).
   
   ```
   * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.3
       frame #0: 0x0000000103f5b104 
tink_bindings.cpython-310-darwin.so`google::protobuf::internal::OnShutdownRun(void
 (*)(void const*), void const*)
   
tink_bindings.cpython-310-darwin.so`google::protobuf::internal::OnShutdownRun:
       0x103f5b104 <+0>:  stp    x28, x27, [sp, #-0x60]!
       0x103f5b108 <+4>:  stp    x26, x25, [sp, #0x10]
       0x103f5b10c <+8>:  stp    x24, x23, [sp, #0x20]
       0x103f5b110 <+12>: stp    x22, x21, [sp, #0x30]
       0x103f5b114 <+16>: stp    x20, x19, [sp, #0x40]
       0x103f5b118 <+20>: stp    x29, x30, [sp, #0x50]
       0x103f5b11c <+24>: add    x29, sp, #0x50
       0x103f5b120 <+28>: mov    x20, x1
       0x103f5b124 <+32>: mov    x21, x0
       0x103f5b128 <+36>: adrp   x8, 501
   ->  0x103f5b12c <+40>: ldr    x8, [x8, #0xe8]
       0x103f5b130 <+44>: ldaprb w8, [x8]
       0x103f5b134 <+48>: adrp   x19, 502
       0x103f5b138 <+52>: ldr    x19, [x19, #0xaa0]\
       0x103f5b138 <+52>: ldr    x19, [x19, #0xaa0]
       0x103f5b13c <+56>: tbz    w8, #0x0, 0x103f5b218 ; <+276>
       0x103f5b140 <+60>: ldr    x22, [x19]
       0x103f5b144 <+64>: add    x19, x22, #0x18
       0x103f5b148 <+68>: mov    x0, x19
   ->  0x103f5b14c <+72>: bl     0x104026a88    ; 
absl::lts_20240722::Mutex::Lock()
       0x103f5b150 <+76>: ldp    x9, x8, [x22, #0x8]
       0x103f5b154 <+80>: cmp    x9, x8
       0x103f5b158 <+84>: b.hs   0x103f5b16c    ; <+104>
   ```
   
   When I read the register, it's giving the same address (at the first arrow):
   
   ```
   x8 = 0x00000001083278c8  guard variable for 
google::protobuf::internal::ShutdownData::get()::data
   ```
   
   The data is already created while importing `pyarrow`, and it has a member 
of `std::mutex`.
   
   But then it calls the `absl` mutex lock, which expects `absl` mutex, which 
can crash the program.
   
   ----------------
   
   Hope that this helps!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to