https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125436

--- Comment #9 from Kevin Puetz <puetzk at puetzk dot org> ---
Created attachment 64546
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64546&action=edit
g++ -g -shared -fPIC -O1 foo.cpp -o libfoo.so (demonstrate corruption in gcc
16.1.0)

Ok, that was the easy way but I was afraid you'd say the asm-reg stuff was
sketchy. I did not have hard registers assigned like that in the original case,
I just had a lot of inlining and enough register complexity that it eventually
ended up using esi for something. But I lost that pretty early in the attempts
to minimize it, because esi is a long way down the list of preferred registers.
I assume that's since it's callee-preserve, so the allocator would rather use
something volatile it can just clobber. But after enough tinkiner I came up
with a structure to ynthetically create a lot of simultaneously-live registers
without it actually being complicated, and where it mostly doesn't matter which
variable(s) get clobbered.

This one works with the same main.c, but its output is the opposite. I load
many copies of magic_number, xor-ing them all together so the result depends on
all the variables (making them all live). Since they are loaded from a volatile
source, the optimizer can't actually assume they are all the same, and must
store each one, creating enough pressure to that it eventually allocates one of
them (`h`) into esi.

But we know they are all the same, so the correct answer is "0". An even number
of copies of the same value, xor'ed together, should cancel out. But if/when
esi gets clobbered with nullptr (the initial value of dtv in the first call to
__tls_get_addr_slow), that incorrectly zeros `h`, leaving an odd number of
intact registers, so the "incorrect" result is printing `ms_abi = 123456768`,
which shows through incomplete cancellation due to clobbering `h`.

I also added a pair of `double` values, which end up in xmm6 and 7, and cancel
those out using subtraction. I don't actually see those get corrupted, but
AFAIK there's no reason __tls_get_addr wouldn't be allowed to; xmm6/7 are
volatile in sysv_abi (but nonvolatile in ms_abi).

The generated asm shows that neither ms_tls_access nor ms_foo have done
anything to preserve them before calling the sysv_abi function __tls_get_addr,
so the risk is still there. Unless there's a special promise made by glibc
somewhere that __tls_get_addr will never use xmm* (e.g. use a malloc that uses
an SSE memset).

If anything in __tls_get_addr were to touch xmm6-15, that would make
ms_tls_access break its claimed calling convention by trashing registers that
are supposed to be callee-preserved.

Reply via email to