https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125436

            Bug ID: 125436
           Summary: calls to __tls_get_addr in ms_abi with -fPIC function
                    clobber callee-save resisters
           Product: gcc
           Version: 13.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: puetzk at puetzk dot org
  Target Milestone: ---

Created attachment 64536
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64536&action=edit
g++ -g -shared -fPIC -O1 foo.cpp -o libfoo.so

If a function with  accesses a `thead_local` variable, the resulting code will
call __tls_get_addr() (at least for the default -mtls-dialect=gnu).

However, if the function containing the TLS access is `__attribute((ms_abi))`,
or is inlined into a function that is `__attribute((ms_abi))`, the code
generation does not appear to account for the possibility that this implicit
function call will use the sysv_abi interpretation of caller-save/callee-save,
and therefore __tls_get_addr (or particularly __tls_get_addr_slow) may clobber
registers that the caller did not preserve, corrupting state upon return.

For example, the attached code miscompiles on ubuntu 24.04 g++
13.3.0-6ubuntu2~24.04.1

g++ -g -shared -fPIC -O1 foo.cpp -o libfoo.so
gcc -g -DUSE_DL main.c -ldl
./a.out

> ms_abi = ac6778d0 (garbage from __tls_get_addr_slow, varies from run to run)
> sysv = 12345678 (correct answer)
> ms_abi = 54479fb8 (different garbage, since now the DTV is already set )
> ms_abi = 54479fb8 (stable from now on)


As seen from https://godbolt.org/z/YhqqhjnbT

> ms_foo:
>         push    rdi     #
>         push    rsi     #
>         push    rbx     #
>         test    ecx, ecx        # i
>         js      .L8 #,
>         mov     ebx, ecx  # i, tmp95
> # /app/example.cpp:29:    ret = magic_number;
>         mov     edi, DWORD PTR magic_number[rip]     # <retval>, magic_number
> 
> # /app/example.cpp:35:    tls_instance += i;
>         lea     rdi, tls_instance@tlsld[rip]
>         call    __tls_get_addr@PLT          #
>         mov     rsi, rax  # tmp86, tmp96
>         add     DWORD PTR tls_instance@dtpoff[rsi], ebx      # tls_instance, i
> .L6:
> # /app/example.cpp:46: }
>         mov     eax, edi  #, <retval>
>         pop     rbx       #
>         pop     rsi       #
>         pop     rdi       #
>         ret     
> .L8:
> # /app/example.cpp:20:  int ret = -1;
>         mov     edi, -1   # <retval>,
> # /app/example.cpp:45:  return ret;
>         jmp     .L6       #

So ret is allicated into edi, which is clobbered during the access to
tls_instance, then edi is transferred back into edx for the return value.

Flipping through versions on compiler explorer, the codegen very similar to
this is seen in every gcc version <= 15.1

Something changed in 15.2, because it gets
>        mov     ebp, DWORD PTR magic_number[rip]

i.e. `ret` is allocated in ebp rather than edi. This avoids the bug (since ebp
is calee-save in both ms_abi and sysv_abi), but I think it's an incidental
register-allocation change rather than an actual fix, because the other
register preserving needed at a boundary between ms_abi and sysv_abi does not
appear, meaning __tls_get_addr could still clobber other things that the caller
of ms_foo() expected it to preserve.

gcc 16.1 (and trunk) change back to allocating `ret` in edi, but now they hoist

>         lea     rdi, "tls_instance"@tlsld[rip]       #
>         call    "__tls_get_addr"@PLT            #
# /app/example.cpp:10:    ret = magic_number; 
>         mov     edx, DWORD PTR "magic_number"[rip]   # <retval>, magic_number


This too avoids the immediate symptom, but still leaves ms_abi in violation of
its __attribute__((ms_abi) by having the __tls_get_addr call potentially
clobber the registers (RDI, RSI, XMM6-XMM15, I think?) that ms_abi considers
callee-save but sysv_abi considers caller-save.

FWIW, clang gets this right, and so does gcc when there's actually an explicit
call to a sysv_abi function in the source code, rather than an implicit call to
__tls_get_addr an implementation detail.

I can't find anything in
https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt that suggests
anything other than treating __tls_get_addr as a normal sysv_abi function, e.g.
it says

> The functions defined above [ the @TLSCALL ones for -mtls-dialect=gnu2 ] use 
> custom calling conventions that require them to preserve any registers they 
> modify. This penalizes the case that requires dynamic TLS, since it must 
> preserve (*) all call-clobbered registers before calling __tls_get_addr()

Which certainly sounds to me like `__tls_get_addr` does *not* make any special
promises, and is free to clobber anything other than RBX, RSP, RBP, and
R12–R15.

Reply via email to