http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47477
--- Comment #17 from Kai Tietz <ktietz at gcc dot gnu.org> --- What optimization you expect here? I see by the new type-demotion pass some changes in optimized tree-output: foo () { int i; short int _4; char _5; unsigned short _6; unsigned short _8; short int _9; unsigned short _10; unsigned short _11; short int _12; sizetype _25; <bb 2>: goto <bb 4>; <bb 3>: <bb 4>: # i_17 = PHI <i_14(3), 0(2)> _25 = (sizetype) i_17; _4 = MEM[symbol: a, index: _25, step: 2, offset: 0B]; _5 = (char) _4; _6 = (unsigned short) _5; _9 = MEM[symbol: b, index: _25, step: 2, offset: 0B]; _8 = (unsigned short) _9; _10 = _8 + 17; _11 = _10 + _6; _12 = (short int) _11; MEM[symbol: a, index: _25, step: 2, offset: 0B] = _12; i_14 = i_17 + 1; if (i_14 != 1024) goto <bb 3>; else goto <bb 5>; <bb 5>: return; } what then gets simplified to the following assembler on IA32: _foo: xorl %eax, %eax .p2align 4,,10 L2: movsbw _a(%eax,%eax), %dx movzwl _b(%eax,%eax), %ecx leal 17(%ecx,%edx), %edx movw %dx, _a(%eax,%eax) addl $1, %eax cmpl $1024, %eax jne L2 rep ret The same assembler gets produced for my with all compilers back to 4.6.0, just tree-optimization output differs.