https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125436
--- Comment #7 from Kevin Puetz <puetzk at puetzk dot org> --- Created attachment 64543 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64543&action=edit g++ -g -shared -fPIC -O0 foo.cpp -o libfoo.so (reproduces with gcc 16.1.0) The same one works if you just force it to use si instead of di. So I can either randomly fiddle with making ms_foo complicated enough that ends up being the register allocation (like it was in my original code, but that involved wine and lots of cruft you don't really want), or could just brute-force it by changing line foo.cpp:20 to use https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html. I hope that's OK. I.e. change line 20 of foo.cpp from -int ret -1; +register int ret asm("si") = -1; And then it now has to be compiled at -O0 instead of -O1, so it actually uses %esi all the way through optimization passes changing that initial allocation. Also, at higher optimization levels, gcc 16 seems to like moving the call to __tls_get_addr earlier (above the volatile load), which of course masks the problem if the store to esi comes after the call. But with the asm("si") and `g++ -g -shared -fPIC -O0 foo.cpp -o libfoo.so`, I can reproduce corruption with gcc:16.1.0 from https://hub.docker.com/_/gcc#supported-tags-and-respective-dockerfile-links > ms_abi = 00000000 > sysv = 12345678 > ms_abi = 12345678 > ms_abi = 12345678 Using esi gets a different (and less random) than the corruption seen with edi, since now what we're getting is the dtv as seen in update_get_addr. So it's nullptr the first time and the second time we don't see corruption since update_get_addr doesn't have to run again. it would of course get more complicated with more dlopen/dlclose activity. But it does show *a* runtime malfunction, and the mechanism is the same (__tls_get_addr clobbering registers that ms_abi would consider callee-save). I don't have trunk handy except through godbolt, and godbolt doesn't give me a way (that I know of) to get dlopen involved so that things have to pass through __tls_get_addr_slow. __tls_get_addr itself doesn't stomp on much if it doesn't fall off to the slow path... so that can show the wrong-code, but executing there won't readily show a run-time symptom. I also admit I don't specifically know how to produce run-time corruption involving xmm*, I just can't find any promise on glibc's part not to let the slow path (which needs to malloc and such) get into arbitrary user-defined malloc replacements, or who knows what all. And sysv_abi in general considers those volatile, so stomping on them breaks no rules, and I think it's the job of the ms_abi -> sysv_abi boundry to consider them clobbered (like it does for ordinary function calls).
