On Thu, Dec 4, 2014 at 12:00 PM, Jiong Wang <jiong.w...@arm.com> wrote: > For PR62173, the ideal solution is to resolve the problem on tree level > ivopt pass. > > While, apart from the tree level issue, PR 62173 also exposed another two > RTL level issues. > one of them is looks like we could improve RTL level loop invariant hoisting > by re-shuffle insns. > > for Seb's testcase > > void bar(int i) { > char A[10]; > int d = 0; > while (i > 0) > A[d++] = i--; > > while (d > 0) > foo(A[d--]); > } > > the insn sequences to calculate A[I]'s address looks like: > > (insn 76 75 77 22 (set (reg/f:DI 109) > (plus:DI (reg/f:DI 64 sfp) > (reg:DI 108 [ i ]))) seb-pop.c:8 84 {*adddi3_aarch64} > (expr_list:REG_DEAD (reg:DI 108 [ i ]) > (nil))) > (insn 77 76 78 22 (set (reg:SI 110 [ D.2633 ]) > (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 109) > (const_int -16 [0xfffffffffffffff0])) [0 A S1 A8]))) seb-pop.c:8 76 > {*zero_extendqisi2_aarch64} > (expr_list:REG_DEAD (reg/f:DI 109) > (nil))) > > while for most RISC archs, reg + reg addressing is typical, so if we > re-shuffle > the instruction sequences into the following: > > (insn 96 94 97 22 (set (reg/f:DI 129) > (plus:DI (reg/f:DI 64 sfp) > (const_int -16 [0xfffffffffffffff0]))) seb-pop.c:8 84 {*adddi3_aarch64} > (nil)) > (insn 97 96 98 22 (set (reg:DI 130 [ i ]) > (sign_extend:DI (reg/v:SI 97 [ i ]))) seb-pop.c:8 70 > {*extendsidi2_aarch64} > (expr_list:REG_DEAD (reg/v:SI 97 [ i ]) > (nil))) > (insn 98 97 99 22 (set (reg:SI 131 [ D.2633 ]) > (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 129) > (reg:DI 130 [ i ])) [0 A S1 A8]))) seb-pop.c:8 76 > {*zero_extendqisi2_aarch64} > (expr_list:REG_DEAD (reg:DI 130 [ i ]) > (expr_list:REG_DEAD (reg/f:DI 129) > (nil)))) > > which means re-associate the constant imm with the virtual frame pointer. > > transform > > RA <- fixed_reg + RC > RD <- MEM (RA + const_offset) > > into: > > RA <- fixed_reg + const_offset > RD <- MEM (RA + RC) > > then RA <- fixed_reg + const_offset is actually loop invariant, so the later > RTL GCSE PRE pass could catch it and do the hoisting, and thus ameliorate > what tree > level ivopts could not sort out.
There is a LIM pass after gimple ivopts - if the invariantness is already visible there why not handle it there similar to the special-cases in rewrite_bittest and rewrite_reciprocal? And of course similar tricks could be applied on the RTL level to RTL invariant motion? Thanks, Richard. > and this patch only tries to re-shuffle instructions within single basic > block which > is a inner loop which is perf critical. > > I am reusing the loop info in fwprop because there is loop info and it's run > before > GCSE. > > verified on aarch64 and mips64, the array base address hoisted out of loop. > > bootstrap ok on x86-64 and aarch64. > > comments? > > thanks. > > gcc/ > PR62173 > fwprop.c (prepare_for_gcse_pre): New function. > (fwprop_done): Call it.