[Bug target/77308] surprisingly large stack usage for sha512 on arm

wilco at gcc dot gnu.org Wed, 26 Oct 2016 11:20:19 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308


--- Comment #20 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #19)
> I think the problem with anddi iordi and xordi instructions is that
> they obscure the data flow between low and high half words.
> When they are not enabled, we have the low and high parts
> expanded independently, but in the case of the di mode instructions
> it is not clear which of the half words propagate from input to output.
> 
> With my new patch, we have 2328 bytes stack for hard float point,
> and only 272 bytes for arm-none-eabi which is a target I care about.
> 
> 
> This is still not perfect, but certainly a big improvement.
> 
> Wilco, where have you seen the additional registers used with my
> previous patch, maybe we can try to fix that somehow?

What happens is that the move of zero causes us to use extra registers in
shifts as both source and destination are now always live at the same time. We
generate worse code for simple examples like x | (y << 3):

-mfpu=vfp:
        push    {r4, r5}
        lsls    r5, r1, #3
        orr     r5, r5, r0, lsr #29
        lsls    r4, r0, #3
        orr     r0, r4, r2
        orr     r1, r5, r3
        pop     {r4, r5}
        bx      lr
-mfpu=neon:
        lsls    r1, r1, #3
        orr     r1, r1, r0, lsr #29
        lsls    r0, r0, #3
        orrs    r0, r0, r2
        orrs    r1, r1, r3
        bx      lr

So that means this is not a solution.

Note init_regs already does insert moves of zero before expanded shifts (I get
the same code with -mfpu=vfp with or without your previous patch), so it
shouldn't be necessary. Why does it still make a difference? Presumably
init_regs doesn't find all cases or inserts the moves at the right place, so we
should fix that rather than do it in the shift expansion.

However the underlying issue is that DI mode operations are not all split at
exactly the same time, and that is what needs to be fixed.

[Bug target/77308] surprisingly large stack usage for sha512 on arm

Reply via email to