https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125436
--- Comment #9 from Kevin Puetz <puetzk at puetzk dot org> --- Created attachment 64546 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64546&action=edit g++ -g -shared -fPIC -O1 foo.cpp -o libfoo.so (demonstrate corruption in gcc 16.1.0) Ok, that was the easy way but I was afraid you'd say the asm-reg stuff was sketchy. I did not have hard registers assigned like that in the original case, I just had a lot of inlining and enough register complexity that it eventually ended up using esi for something. But I lost that pretty early in the attempts to minimize it, because esi is a long way down the list of preferred registers. I assume that's since it's callee-preserve, so the allocator would rather use something volatile it can just clobber. But after enough tinkiner I came up with a structure to ynthetically create a lot of simultaneously-live registers without it actually being complicated, and where it mostly doesn't matter which variable(s) get clobbered. This one works with the same main.c, but its output is the opposite. I load many copies of magic_number, xor-ing them all together so the result depends on all the variables (making them all live). Since they are loaded from a volatile source, the optimizer can't actually assume they are all the same, and must store each one, creating enough pressure to that it eventually allocates one of them (`h`) into esi. But we know they are all the same, so the correct answer is "0". An even number of copies of the same value, xor'ed together, should cancel out. But if/when esi gets clobbered with nullptr (the initial value of dtv in the first call to __tls_get_addr_slow), that incorrectly zeros `h`, leaving an odd number of intact registers, so the "incorrect" result is printing `ms_abi = 123456768`, which shows through incomplete cancellation due to clobbering `h`. I also added a pair of `double` values, which end up in xmm6 and 7, and cancel those out using subtraction. I don't actually see those get corrupted, but AFAIK there's no reason __tls_get_addr wouldn't be allowed to; xmm6/7 are volatile in sysv_abi (but nonvolatile in ms_abi). The generated asm shows that neither ms_tls_access nor ms_foo have done anything to preserve them before calling the sysv_abi function __tls_get_addr, so the risk is still there. Unless there's a special promise made by glibc somewhere that __tls_get_addr will never use xmm* (e.g. use a malloc that uses an SSE memset). If anything in __tls_get_addr were to touch xmm6-15, that would make ms_tls_access break its claimed calling convention by trashing registers that are supposed to be callee-preserved.
