https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63737

            Bug ID: 63737
           Summary: Missed optimization: -ffixed-reg and unelided copies
           Product: gcc
           Version: 4.9.1
            Status: UNCONFIRMED
          Severity: trivial
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tstache1 at binghamton dot edu

Excuse me if I'm misunderstanding the semantics of -ffixed-reg and pinned
registers, but take this trivial example:

#include <emmintrin.h>

register __m128i fixed_reg_1 __asm__ ("xmm6");
register __m128i fixed_reg_2 __asm__ ("xmm7");

__m128i xmm_add(__m128i a, __m128i b) {
  __m128i dest;

  fixed_reg_1 = a + b;
  fixed_reg_2 = fixed_reg_1 + b;
  dest = fixed_reg_1 + fixed_reg_2;

  return dest;
}

Compiling it (-c -O2 -ffixed-xmm6 -ffixed-reg-xmm7 -mavx) seems to produce the
following, non-optimal code (the compiler does not elide the copies):

0000000000000000 <xmm_add>:
   0:    c5 f9 d4 c1              vpaddq %xmm1,%xmm0,%xmm0
   4:    c5 f9 6f f0              vmovdqa %xmm0,%xmm6
   8:    c5 f1 d4 c0              vpaddq %xmm0,%xmm1,%xmm0
   c:    c5 f9 6f f8              vmovdqa %xmm0,%xmm7
  10:    c5 f9 d4 c6              vpaddq %xmm6,%xmm0,%xmm0
  14:    c3                       retq   

Ideally, I would think that the compiler should generate code like so, no?
0000000000000000 <xmm_add_opt>:
   0:    c5 f9 d4 f1              vpaddq %xmm1,%xmm0,%xmm6
   4:    c5 f1 d4 fe              vpaddq %xmm6,%xmm1,%xmm7
   8:    c5 c1 d4 c6              vpaddq %xmm6,%xmm7,%xmm0
   c:    c3                       retq

Reply via email to