http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51281

             Bug #: 51281
           Summary: GCC fails to hoist stores in loop
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: darkshik...@gmail.com


void foo( int *x )
{
    for(int i=0; i<100; i++) 
        asm("inc %0" :"+r"(*x));
}

This generates the following on gcc 4.5.3:

0000000000000000 <foo>:
   0:   b8 64 00 00 00          mov    eax,0x64
   5:   8b 17                   mov    edx,[rdi]
   7:   ff c2                   inc    edx
   9:   0f 1f 80 00 00 00 00    nop    dword[rax+0x0]
  10:   83 e8 01                sub    eax,0x1
  13:   89 17                   mov    [rdi],edx
  15:   75 f9                   jne    10 <foo+0x10>
  17:   f3 c3                   repz ret

And the following on gcc svn:

0000000000000000 <foo>:
   0:   b8 64 00 00 00          mov    eax,0x64
   5:   8b 17                   mov    edx,[rdi]
   7:   ff c2                   inc    edx
   9:   0f 1f 80 00 00 00 00    nop    dword[rax+0x0]
  10:   83 e8 01                sub    eax,0x1
  13:   89 17                   mov    [rdi],edx
  15:   75 f9                   jne    10 <foo+0x10>
  17:   f3 c3                   repz ret

gcc fails to hoist out the store, doing it on every loop iteration, despite
being told to keep *x in a register.  This occurs in all versions of gcc,
including latest svn.

This may seem like an utterly pointless test case, but this is causing
significant performance degradation in actual code, where gcc repeatedly stores
the outputs of inline assembly at the end of each loop iteration, instead of
keeping them in registers as it should.  This causes insertion of huge numbers
of redundant stores in certain cases.  I initially thought it was an aliasing
issue of some sort, but this test case demonstrates that it happens even in the
simplest of cases, which is rather bizarre.

Is this optimization supposed to happen, or is a missing optimization?

Reply via email to