https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66867
--- Comment #11 from dhowells at redhat dot com <dhowells at redhat dot com> --- I applied the patch to the Fedora cross-gcc-6.1.1 rpm with one minor fixup. Using the example code I added in bug 70825 I now see: 0000000000000000 <test_atomic_cmpxchg>: 0: ba 2a 00 00 00 mov $0x2a,%edx 5: b8 17 00 00 00 mov $0x17,%eax a: f0 0f b1 17 lock cmpxchg %edx,(%rdi) e: c3 retq there's now no extraneous store before the locked instruction. And: 000000000000000f <test_atomic_cmpxchg_2>: f: ba 2a 00 00 00 mov $0x2a,%edx 14: b8 17 00 00 00 mov $0x17,%eax 19: f0 0f b1 17 lock cmpxchg %edx,(%rdi) 1d: c3 retq it now just passes the return value of cmpxchg back directly without potentially putting on and off the stack and maybe jumping round that bit. And: 0000000000000043 <test_atomic_cmpxchg_B>: 43: ba 2a 00 00 00 mov $0x2a,%edx 48: b8 17 00 00 00 mov $0x17,%eax 4d: f0 0f b1 17 lock cmpxchg %edx,(%rdi) 51: c3 retq where it makes no difference changing how the the return-statements are contructed in C. I've also booted a kernel built with the patched compiler.