https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63737
Bug ID: 63737 Summary: Missed optimization: -ffixed-reg and unelided copies Product: gcc Version: 4.9.1 Status: UNCONFIRMED Severity: trivial Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: tstache1 at binghamton dot edu Excuse me if I'm misunderstanding the semantics of -ffixed-reg and pinned registers, but take this trivial example: #include <emmintrin.h> register __m128i fixed_reg_1 __asm__ ("xmm6"); register __m128i fixed_reg_2 __asm__ ("xmm7"); __m128i xmm_add(__m128i a, __m128i b) { __m128i dest; fixed_reg_1 = a + b; fixed_reg_2 = fixed_reg_1 + b; dest = fixed_reg_1 + fixed_reg_2; return dest; } Compiling it (-c -O2 -ffixed-xmm6 -ffixed-reg-xmm7 -mavx) seems to produce the following, non-optimal code (the compiler does not elide the copies): 0000000000000000 <xmm_add>: 0: c5 f9 d4 c1 vpaddq %xmm1,%xmm0,%xmm0 4: c5 f9 6f f0 vmovdqa %xmm0,%xmm6 8: c5 f1 d4 c0 vpaddq %xmm0,%xmm1,%xmm0 c: c5 f9 6f f8 vmovdqa %xmm0,%xmm7 10: c5 f9 d4 c6 vpaddq %xmm6,%xmm0,%xmm0 14: c3 retq Ideally, I would think that the compiler should generate code like so, no? 0000000000000000 <xmm_add_opt>: 0: c5 f9 d4 f1 vpaddq %xmm1,%xmm0,%xmm6 4: c5 f1 d4 fe vpaddq %xmm6,%xmm1,%xmm7 8: c5 c1 d4 c6 vpaddq %xmm6,%xmm7,%xmm0 c: c3 retq