On 11 August 2014 19:14, Ramana Radhakrishnan <ramana....@googlemail.com> wrote: > On Mon, Aug 11, 2014 at 3:35 AM, Zhenqiang Chen > <zhenqiang.c...@linaro.org> wrote: >> On 8 August 2014 23:22, Ramana Radhakrishnan <ramana....@googlemail.com> >> wrote: >>> On Tue, Aug 5, 2014 at 10:31 AM, Zhenqiang Chen >>> <zhenqiang.c...@linaro.org> wrote: >>>> Hi, >>>> >>>> For some large constants, ARM will split them during expanding, which >>>> makes impossible to hoist them out the loop or shared by different >>>> references (refer the test case in the patch). >>>> >>>> The patch keeps some constants in registers. If the constant can not >>>> be optimized, the cprop and combine passes can optimize them as what >>>> we do in current expand pass with >>>> >>>> define_insn_and_split "*arm_subsi3_insn" >>>> define_insn_and_split "*arm_andsi3_insn" >>>> define_insn_and_split "*iorsi3_insn" >>>> define_insn_and_split "*arm_xorsi3" >>>> >>>> The patch does not modify addsi3 since the define_insn_and_split >>>> "*arm_addsi3" is only valid when (reload_completed || >>>> !arm_eliminable_register (operands[1])). The cprop and combine passes >>>> can not optimize the large constant if we put it in register, which >>>> will lead to regression. >>>> >>>> For logic operators, the patch skips changes for constants: >>>> >>>> INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2]) >>>> >>>> since expand pass always uses "sign-extend" to get the value >>>> (trunc_int_for_mode called from immed_wide_int_const) for rtl, and >>>> logs show most negative values are UNSIGNED when they are TREE node. >>>> And combine pass is smart enough to recover the negative value to >>>> positive value. >>> >>> I am unable to verify any change in code generation for this testcase >>> with and without the patch when I had a play with the patch. >>> >>> what gives ? >> >> Thanks for trying the patch. >> >> Do you add option -fno-gcse which is mentioned in dg-options " -O2 >> -fno-gcse "? Without it, there is no change for the testcase since >> cprop pass will propagate the constant to AND expr (A patch to enhance >> cprop pass was discussed at >> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01321.html). > > Probably not and I can now see the difference in code generated for > Thumb state. Why is it that in ARM state with -mcpu=cortex-a15 we see > the hoisting of the constant without your patch with -fno-gcse ?
The difference between ARM and THUMB2 modes are due to rtx_cost difference. For ARM mode, the constant is force_reg in function avoid_expensive_constant (obtabs.c) before gen_andsi3 when expanding. > So, the patch improves code generation for -mcpu=cortex-a15 -mthumb > -fno-gcse for the given testcase ? Yes. >> >> In addition, if the constant can not be hoisted out the loop, later >> combine pass can also optimize it. These (cprop and combine) are >> reasons why the patch itself has little impact on current tests. > > Does this mean you need the referred to patch to be useful as a > pre-requisite ? I fail to understand why this patch needs to go in if > it makes no difference without disabling GCSE. I cannot see -fno-gcse > being used by default for performant code. For some codes, -fno-gcse might get better performance. Please refer paper: A case study: optimizing GCC on ARM for performance of libevas rasterization library http://ctuning.org/dissemination/grow10-03.pdf The issues mentioned in the paper had been solved since arm_split_constant is smart enough to handle the 0xff00ff. But what for other irregular constant? The patch gives a chance to handle them. Thanks! -Zhenqiang > regards > Ramana