On 11 August 2014 19:14, Ramana Radhakrishnan <ramana....@googlemail.com> wrote:
> On Mon, Aug 11, 2014 at 3:35 AM, Zhenqiang Chen
> <zhenqiang.c...@linaro.org> wrote:
>> On 8 August 2014 23:22, Ramana Radhakrishnan <ramana....@googlemail.com> 
>> wrote:
>>> On Tue, Aug 5, 2014 at 10:31 AM, Zhenqiang Chen
>>> <zhenqiang.c...@linaro.org> wrote:
>>>> Hi,
>>>>
>>>> For some large constants, ARM will split them during expanding, which
>>>> makes impossible to hoist them out the loop or shared by different
>>>> references (refer the test case in the patch).
>>>>
>>>> The patch keeps some constants in registers. If the constant can not
>>>> be optimized, the cprop and combine passes can optimize them as what
>>>> we do in current expand pass with
>>>>
>>>> define_insn_and_split "*arm_subsi3_insn"
>>>> define_insn_and_split "*arm_andsi3_insn"
>>>> define_insn_and_split "*iorsi3_insn"
>>>> define_insn_and_split "*arm_xorsi3"
>>>>
>>>> The patch does not modify addsi3 since the define_insn_and_split
>>>> "*arm_addsi3" is only valid when (reload_completed ||
>>>> !arm_eliminable_register (operands[1])). The cprop and combine passes
>>>> can not optimize the large constant if we put it in register, which
>>>> will lead to regression.
>>>>
>>>> For logic operators, the patch skips changes for constants:
>>>>
>>>> INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2])
>>>>
>>>> since expand pass always uses "sign-extend" to get the value
>>>> (trunc_int_for_mode called from immed_wide_int_const) for rtl, and
>>>> logs show most negative values are UNSIGNED when they are TREE node.
>>>> And combine pass is smart enough to recover the negative value to
>>>> positive value.
>>>
>>> I am unable to verify any change in code generation for this testcase
>>> with and without the patch when I had a play with the patch.
>>>
>>> what gives ?
>>
>> Thanks for trying the patch.
>>
>> Do you add option -fno-gcse which is mentioned in dg-options " -O2
>> -fno-gcse "? Without it, there is no change for the testcase since
>> cprop pass will propagate the constant to AND expr (A patch to enhance
>> cprop pass was discussed at
>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01321.html).
>
> Probably not and I can now see the difference in code generated for
> Thumb state. Why is it that in ARM state with -mcpu=cortex-a15 we see
> the hoisting of the constant without your patch with -fno-gcse ?

The difference between ARM and THUMB2 modes are due to rtx_cost
difference. For ARM mode, the constant is force_reg in function
avoid_expensive_constant (obtabs.c) before gen_andsi3 when expanding.

> So, the patch improves code generation for -mcpu=cortex-a15 -mthumb
> -fno-gcse for the given testcase ?

Yes.

>>
>> In addition, if the constant can not be hoisted out the loop, later
>> combine pass can also optimize it. These (cprop and combine) are
>> reasons why the patch itself has little impact on current tests.
>
> Does this mean you need the referred to patch to be useful as a
> pre-requisite ? I fail to  understand why this patch needs to go in if
> it makes no difference without disabling GCSE. I cannot see -fno-gcse
> being used by default for performant code.

For some codes, -fno-gcse might get better performance. Please refer paper:

A case study: optimizing GCC on ARM for performance of libevas
rasterization library
http://ctuning.org/dissemination/grow10-03.pdf

The issues mentioned in the paper had been solved since
arm_split_constant is smart enough to handle the 0xff00ff. But what
for other irregular constant?

The patch gives a chance to handle them.

Thanks!
-Zhenqiang

> regards
> Ramana

Reply via email to