On 17 June 2014 16:15, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen > <zhenqiang.c...@linaro.org> wrote: >> Hi, >> >> For some large constant, ports like ARM, need one more instructions to >> operate it. e.g >> >> #define MASK 0xfe00ff >> void maskdata (int * data, int len) >> { >> int i = len; >> for (; i > 0; i -= 2) >> { >> data[i] &= MASK; >> data[i + 1] &= MASK; >> } >> } >> >> Need two instructions for each AND operation: >> >> and r3, r3, #16711935 >> bic r3, r3, #65536 >> >> If we keep the MASK in a register, loop2_invariant pass can hoist it >> out the loop. And it can be shared by different references. >> >> So the patch skips constant propagation if it makes INSN's cost higher. > > So cprop undos invariant motions work here?
Yes. GLOBAL CONST-PROP will undo invariant motions. > Should we make sure we add a REG_EQUAL note when not propagating? Logs show there already has REG_EQUAL note. >> Bootstrap and no make check regression on X86-64 and ARM Chrome book. >> >> OK for trunk? >> >> Thanks! >> -Zhenqiang >> >> ChangeLog: >> 2014-06-17 Zhenqiang Chen <zhenqiang.c...@linaro.org> >> >> * cprop.c (try_replace_reg): Check cost for constants. >> >> diff --git a/gcc/cprop.c b/gcc/cprop.c >> index aef3ee8..c9cf02a 100644 >> --- a/gcc/cprop.c >> +++ b/gcc/cprop.c >> @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn) >> rtx src = 0; >> int success = 0; >> rtx set = single_set (insn); >> + int old_cost = 0; >> + bool copy_p = false; >> + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); >> + >> + if (set && SET_SRC (set) && REG_P (SET_SRC (set))) >> + copy_p = true; >> + else >> + old_cost = set_rtx_cost (set, speed); > > Looks bogus for set == NULL? set_rtx_cost has checked it. If it is NULL, the function will return 0; > Also what about register pressure? Do you think it has big register pressure impact? I think it does not increase register pressure. > I think this kind of change needs wider testing as RTX costs are > usually not fully implemented and you introduce a new use kind > (or is it already used elsewhere in this way to compute cost > difference of a set with s/reg/const?). Passes like fwprop, cse, auto_inc_dec, uses RTX costs to make the decision. e.g. in function attempt_change of auto-inc-dec.c, it has code segments like: old_cost = (set_src_cost (mem, speed) + set_rtx_cost (PATTERN (inc_insn.insn), speed)); new_cost = set_src_cost (mem_tmp, speed); ... if (old_cost < new_cost) { ... return false; } The usage of RTX costs in this patch is similar. I had run X86-64 bootstrap and regression tests with --enable-languages=c,c++,lto,fortran,go,ada,objc,obj-c++,java And ARM bootstrap and regression tests with --enable-languages=c,c++,fortran,lto,objc,obj-c++ I will run tests on i686. What other tests do you think I have to run? > What kind of performance difference do you see? I had run coremark, dhrystone, eembc on ARM Cortex-M4 (with some arm backend changes). Coremark with some options show >10% performance improvement. dhrystone is a little better. Some wave in eembc, but overall result is better. I will run spec2000 on X86-64 and ARM, and back to you about the performance changes. Thanks! -Zhenqiang > Thanks, > Richard. > >> /* Usually we substitute easy stuff, so we won't copy everything. >> We however need to take care to not duplicate non-trivial CONST >> @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn) >> to = copy_rtx (to); >> >> validate_replace_src_group (from, to, insn); >> + >> + /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop. >> + And it can be shared by different references. So skip propagation if >> + it makes INSN's rtx cost higher. */ >> + if (set && !copy_p && CONSTANT_P (to)) >> + { >> + int new_cost = set_rtx_cost (set, speed); >> + if (new_cost > old_cost) >> + { >> + cancel_changes (0); >> + return false; >> + } >> + } >> + >> if (num_changes_pending () && apply_change_group ()) >> success = 1;