Inokinoki commented on issue #40088:
URL: https://github.com/apache/arrow/issues/40088#issuecomment-2865062539
> Thanks for the update! Yes that's also my doubt.
>
>
> I will keep that in mind and try to continue debugging to figure out what
> is the case.
> […](#)
I further investigated it. And the current conclusion is similar, it's the 2
different versions of `google::protobuf::internal::OnShutdownRun` which causes
the issue.
But they are good in the two libraries, because eventually they are loaded
implicitly using `RTLD_LOCAL` as you said.
I think that it's because of the singleton data they are referring -
`google::protobuf::internal::ShutdownData::get()::data`, which is created only
once on demand:
```
auto shutdown_data = ShutdownData::get();
```
This data is expected to be different in the 2 versions (one with `absl`
mutex as a member and another with `std::mutex` as a member).
I'm not sure whether it's an issue from macOS or not, but it seems that the
"local" variable is shared between the versions, which caused the issue.
----------------------------------
Here are the debugging info:
So I set breakpoint on `google::protobuf::internal::OnShutdownRun` and then
import `pyarrow` first. The assembly of it in `libarrow` is as follows:
```
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2
frame #0: 0x000000010732a0f4
libarrow.2000.dylib`google::protobuf::internal::OnShutdownRun(void (*)(void
const*), void const*)
libarrow.2000.dylib`google::protobuf::internal::OnShutdownRun:
0x10732a0f4 <+0>: stp x26, x25, [sp, #-0x50]!
0x10732a0f8 <+4>: stp x24, x23, [sp, #0x10]
0x10732a0fc <+8>: stp x22, x21, [sp, #0x20]
0x10732a100 <+12>: stp x20, x19, [sp, #0x30]
0x10732a104 <+16>: stp x29, x30, [sp, #0x40]
0x10732a108 <+20>: add x29, sp, #0x40
0x10732a10c <+24>: mov x20, x1
0x10732a110 <+28>: mov x21, x0
0x10732a114 <+32>: adrp x8, 1794
-> 0x10732a118 <+36>: ldr x8, [x8, #0x340]
0x10732a11c <+40>: ldaprb w8, [x8]
0x10732a120 <+44>: adrp x19, 1796
0x10732a124 <+48>: ldr x19, [x19, #0x50]
0x10732a128 <+52>: tbz w8, #0x0, 0x10732a238 ; <+324>
0x10732a12c <+56>: ldr x22, [x19]
0x10732a130 <+60>: add x19, x22, #0x18
0x10732a134 <+64>: mov x0, x19
0x10732a138 <+68>: bl 0x107531dd0 ; symbol stub for:
std::__1::mutex::lock()
0x10732a13c <+72>: ldp x23, x8, [x22, #0x8]
0x10732a140 <+76>: cmp x23, x8
0x10732a140 <+76>: cmp x23, x8
0x10732a144 <+80>: b.hs 0x10732a158 ; <+100>
0x10732a148 <+84>: stp x21, x20, [x23]
0x10732a14c <+88>: add x8, x23, #0x10
```
where the marked instruction is to get the singleton data. When I get the
register bank, it contains:
```
x8 = 0x0000000107b2b8c8 guard variable for
google::protobuf::internal::ShutdownData::get()::data
```
We can notice that `0x10732a138 <+68>` contains a direct call to standard
C++ lib of mutex lock. And it's ok here because the singleton data also
contains a `std::mutex` member.
I let it continue running and then import `tink` (with a newer version using
`absl` mutex).
```
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.3
frame #0: 0x0000000103f5b104
tink_bindings.cpython-310-darwin.so`google::protobuf::internal::OnShutdownRun(void
(*)(void const*), void const*)
tink_bindings.cpython-310-darwin.so`google::protobuf::internal::OnShutdownRun:
0x103f5b104 <+0>: stp x28, x27, [sp, #-0x60]!
0x103f5b108 <+4>: stp x26, x25, [sp, #0x10]
0x103f5b10c <+8>: stp x24, x23, [sp, #0x20]
0x103f5b110 <+12>: stp x22, x21, [sp, #0x30]
0x103f5b114 <+16>: stp x20, x19, [sp, #0x40]
0x103f5b118 <+20>: stp x29, x30, [sp, #0x50]
0x103f5b11c <+24>: add x29, sp, #0x50
0x103f5b120 <+28>: mov x20, x1
0x103f5b124 <+32>: mov x21, x0
0x103f5b128 <+36>: adrp x8, 501
-> 0x103f5b12c <+40>: ldr x8, [x8, #0xe8]
0x103f5b130 <+44>: ldaprb w8, [x8]
0x103f5b134 <+48>: adrp x19, 502
0x103f5b138 <+52>: ldr x19, [x19, #0xaa0]\
0x103f5b138 <+52>: ldr x19, [x19, #0xaa0]
0x103f5b13c <+56>: tbz w8, #0x0, 0x103f5b218 ; <+276>
0x103f5b140 <+60>: ldr x22, [x19]
0x103f5b144 <+64>: add x19, x22, #0x18
0x103f5b148 <+68>: mov x0, x19
-> 0x103f5b14c <+72>: bl 0x104026a88 ;
absl::lts_20240722::Mutex::Lock()
0x103f5b150 <+76>: ldp x9, x8, [x22, #0x8]
0x103f5b154 <+80>: cmp x9, x8
0x103f5b158 <+84>: b.hs 0x103f5b16c ; <+104>
```
When I read the register, it's giving the same address (at the first arrow):
```
x8 = 0x00000001083278c8 guard variable for
google::protobuf::internal::ShutdownData::get()::data
```
The data is already created while importing `pyarrow`, and it has a member
of `std::mutex`.
But then it calls the `absl` mutex lock, which expects `absl` mutex, which
can crash the program.
----------------
Hope that this helps!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]