http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172
--- Comment #13 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 19 Feb 2014, steven at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172 > > Steven Bosscher <steven at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |steven at gcc dot gnu.org > > --- Comment #12 from Steven Bosscher <steven at gcc dot gnu.org> --- > (In reply to Joey Ye from comment #11) > > Sometimes it helps to use -fdump-rtl-slim. Matter of taste but I find > that much easier to interpret than LISP-like RTL dumps. > > Annotated "good expansion": > ;; _41 = _42 * 4; > 20: r126=r131<<2 > > ;; _40 = _2 + _41; > 21: r136=r130+r119 // r136=Arr_2_Par_Ref+r119 > 22: r125=r136+r126 // r125=Arr_2_Par_Ref+r119+r131<<2 > > ;; MEM[(int[25] *)_51 + 20B] = _34; > 29: r139=r130+r119 // r139=Arr_2_Par_Ref+r119 > 30: r140=r139+r126 // r140=Arr_2_Par_Ref+r119+r131<<2 (==r125) > 31: r141=r140+1000 // r141=Arr_2_Par_Ref+r119+r131<<2+1000 (==r125+1000) > 32: [r141+20]=r124 > > In this case, the RHS for the SETs of r140 and r125 are lexically > identical for value numbering, so the job for CSE is easy. > > > Annotated "bad expansion": > ;; _40 = Arr_2_Par_Ref_22(D) + _12; > 22: r138=r128+r121 > 23: r127=r132+r138 // r127=Arr_2_Par_Ref+r128+r121 > > ;; _32 = _20 + 1000; > 29: r124=r121+1000 > > ;; MEM[(int[25] *)_51 + 20B] = _34; > 32: r141=r132+r124 // r141=Arr_2_Par_Ref+r121+1000 > 33: r142=r141+r128 // r142=Arr_2_Par_Ref+r128+r121+1000 (==r127+1000) (==r138+1000) > 34: [r142+20]=r126 > > Here, the "+1000" confuses CSE. The sets of r127 and r142 have a common > sub-expression as value, but none of the sub-expressions are lexically > identical. RTL CSE has limited ability to look through sub-expressions > to identify "same value" sub-expressions (anchors, base regs, etc.) but > apparently this case is too complex for it to handle. So expansion generates "better" code (a single insn covering the two adds), caused by expanding a chain of two regular PLUS_EXPR rather than a chain of two POINTER_PLUS_EXPRs. That's of course unfortunate - but I can't see how this should be not a missed optimization in CSE ... On the GIMPLE level before expansion we have +40 = Arr_2_Par_Ref_22(D) + (_41 + pretmp_20); _51 = Arr_2_Par_Ref_22(D) + (_41 + (pretmp_20 + 1000)); thus a similar issue - missed CSE due to bad association (and to not having a CSE pass after forwprop exposed the opportunity). Unfortunately we expose the opportunity by late complete unrolling only because early unrolling says size: 7-2, last_iteration: 3-0 Loop size: 7 Estimated size after unrolling: 8 Not unrolling loop 1: size would grow. and you can't make it unroll that loop (outer loops are only ever unrolled early if doing so doesn't increase code-size). Now the order is, late unroll - reassoc - DOM - forwprop, exactly the wrong way around to eventuall catch the CSE opportunity at the GIMPLE level as it would need to be, late unroll - forwprop - reassoc - DOM. Richard.