4.8 Regression] libgcrypt _gcry_burn_stack slowdown

steven at gcc dot gnu.org Tue, 13 Nov 2012 15:38:20 -0800


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285




--- Comment #12 from Steven Bosscher <steven at gcc dot gnu.org> 2012-11-13 
23:37:52 UTC ---

Created attachment 28678

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28678

Gross hack



(In reply to comment #11)

> If loops are still around at LRA time, perhaps LRA should consider putting

> it before loop if register pressure is low, or LIM could just have extra

> code for this



Unfortunately, loop are destroyed _just_ before LRA, at the end of IRA.

IRA has its own loop tree but that is destroyed before LRA, too.





> I'm not saying it must be LIM, I'm

> just looking for suggestions where to perform this.



LIM may be too early. I've experimented with the attached patch (based off

some other patch for invariant addresses that was bit-rotting on a shelf)

and I had to resort to some crude hacks to make loop-invariant even just

consider moving the bare frame_pointer_rtx, like manually setting the cost

to something high because set_src_cost(frame_pointer_rtx)==0.  The result

is this code:



foo:

        leaq    -72(%rsp), %rcx

        leaq    -8(%rsp), %rdx     // A Pyrrhic victory...

        .p2align 4,,10

        .p2align 3

.L5:

        movq    %rcx, %rax

        .p2align 4,,10

        .p2align 3

.L3:

        movb    $0, (%rax)

        addq    $1, %rax

        cmpq    %rdx, %rax

        jne     .L3

        subl    $64, %edi

        testl   %edi, %edi

        jg      .L5

        rep ret





Need to think about this a bit more, perhaps postreload-gcse can be used

for this instead of LIM...

[Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown

Reply via email to