This bug is really transient, and AFAIK i only trigger it when using the cluebat on g++, that is bloating every function in sight appropriately with always_inline/noinline attributes, in a unit that inflates much.
Tracked one occurence to something like that: union float4_t { float f[4]; __m128 v; ... }; static void foobar() { float4_t __attribute__((aligned (16))) bar; ... __m128 foo; ... bar = foo; } If i let g++ decide if foobar() should be inlined or not, everything's fine (but performance of course). Then if i force_inline foobar() i may or may not get something to the effect of: 40666a: movaps %xmm0,0x348(%esp) 406672: mov 0x348(%esp),%eax 406679: mov %eax,0x310(%esp) 406680: mov 0x34c(%esp),%eax 406687: movaps 0x210(%esp),%xmm0 40668f: mov %eax,0x314(%esp) 406696: mov 0x350(%esp),%eax 40669d: movaps %xmm0,0x40(%esp) 4066a2: mov %eax,0x318(%esp) 4066a9: mov 0x354(%esp),%eax Why that value gets suddenly copied around, i don't know. It doesn't matter much anyway, as the program won't survive past the bogus store. It's not just related to that kind of mixed unions either, and again it clearly depends on surrounding functions being force_inlined and noinlined and lots of stuff ending up on the stack. I can trigger it on cygwin and linux, with g++ 4.1.0 and various 4.2.x and once triggered using -0s or -Ox doesn't matter; it's been there for a long time but that's the first time i can track it down somehow (inlining heuristics being extremly anyway). I haven't made a bugreport yet, as that would require disclosing large amounts of code, but i'd like to know if it's a known issue by any chance. Regards, tbp.