http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51281
Bug #: 51281 Summary: GCC fails to hoist stores in loop Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: darkshik...@gmail.com void foo( int *x ) { for(int i=0; i<100; i++) asm("inc %0" :"+r"(*x)); } This generates the following on gcc 4.5.3: 0000000000000000 <foo>: 0: b8 64 00 00 00 mov eax,0x64 5: 8b 17 mov edx,[rdi] 7: ff c2 inc edx 9: 0f 1f 80 00 00 00 00 nop dword[rax+0x0] 10: 83 e8 01 sub eax,0x1 13: 89 17 mov [rdi],edx 15: 75 f9 jne 10 <foo+0x10> 17: f3 c3 repz ret And the following on gcc svn: 0000000000000000 <foo>: 0: b8 64 00 00 00 mov eax,0x64 5: 8b 17 mov edx,[rdi] 7: ff c2 inc edx 9: 0f 1f 80 00 00 00 00 nop dword[rax+0x0] 10: 83 e8 01 sub eax,0x1 13: 89 17 mov [rdi],edx 15: 75 f9 jne 10 <foo+0x10> 17: f3 c3 repz ret gcc fails to hoist out the store, doing it on every loop iteration, despite being told to keep *x in a register. This occurs in all versions of gcc, including latest svn. This may seem like an utterly pointless test case, but this is causing significant performance degradation in actual code, where gcc repeatedly stores the outputs of inline assembly at the end of each loop iteration, instead of keeping them in registers as it should. This causes insertion of huge numbers of redundant stores in certain cases. I initially thought it was an aliasing issue of some sort, but this test case demonstrates that it happens even in the simplest of cases, which is rather bizarre. Is this optimization supposed to happen, or is a missing optimization?