Re: [PATCH] Reduce stack usage in sha512 (PR target/77308)

Christophe Lyon Tue, 18 Oct 2016 01:37:33 -0700

Hi,


On 17 October 2016 at 18:47, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote:
>
> On 30/09/16 14:34, Bernd Edlinger wrote:
>>
>> On 09/30/16 12:14, Bernd Edlinger wrote:
>>>
>>> Eric Botcazou wrote:
>>>>>
>>>>> A comment before the SETs and a testcase would be nice.  IIRC
>>>>> we do have stack size testcases via using -fstack-usage.
>>>>
>>>> Or -Wstack-usage, which might be more appropriate here.
>>>
>>> Yes.  good idea.  I was not aware that we already have that kind of
>>> tests.
>>>
>>> When trying to write this test. I noticed, that I did not try -Os so far.
>>> But for -Os the stack is still the unchanged 3500 bytes.
>>>
>>> However for embedded targets I am often inclined to use -Os, and
>>> would certainly not expect the stack to explode...
>>>
>>> I see in arm.md there are places like
>>>
>>>         /* If we're optimizing for size, we prefer the libgcc calls.  */
>>>         if (optimize_function_for_size_p (cfun))
>>>           FAIL;
>>>
>> Oh, yeah.  The comment is completely misleading.
>>
>> If this pattern fails, expmed.c simply expands some
>> less efficient rtl, which also results in two shifts
>> and one or-op.  No libgcc calls at all.
>>
>> So in simple cases without spilling the resulting
>> assembler is the same, regardless if this pattern
>> fails or not.  But the half-defined out registers
>> make a big difference when it has to be spilled.
>>
>>>         /* Expand operation using core-registers.
>>>            'FAIL' would achieve the same thing, but this is a bit
>>> smarter.  */
>>>         scratch1 = gen_reg_rtx (SImode);
>>>         scratch2 = gen_reg_rtx (SImode);
>>>         arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0],
>>> operands[1],
>>>                                        operands[2], scratch1, scratch2);
>>>
>>>
>>> .. that explains why this happens.  I think it would be better to
>>> use the emit_coreregs for shift count >= 32, because these are
>>> effectively 32-bit shifts.
>>>
>>> Will try if that can be improved, and come back with the
>>> results.
>>>
>> The test case with -Os has 3520 bytes stack usage.
>> When only shift count >= 32 are handled we
>> have still 3000 bytes stack usage.
>> And when arm_emit_coreregs_64bit_shift is always
>> allowed to run, we have 2360 bytes stack usage.
>>
>> Also for the code size it is better not to fail this
>> pattern.  So I propose to remove this exception in all
>> three expansions.
>>
>> Here is an improved patch with the test case from the PR.
>> And a comment on the redundant SET why it is better to clear
>> the out register first.
>>
>>
>> Bootstrap and reg-testing on arm-linux-gnueabihf.
>> Is it OK for trunk?
>
>
> This looks ok to me.
> Thanks,
> Kyrill
>

I am seeing a lot of regressions since this patch was committed:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html

(you can click on "REGRESSED" to see the list of regressions, "sum"
and "log" to download
the corresponding .sum/.log)

Thanks,

Christophe

>>
>> Thanks
>> Bernd.
>
>

Re: [PATCH] Reduce stack usage in sha512 (PR target/77308)

Reply via email to