[Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285 Steven Bosscher steven at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED CC|steven at gcc dot gnu.org | AssignedTo|unassigned at gcc dot |steven at gcc dot gnu.org |gnu.org | --- Comment #10 from Steven Bosscher steven at gcc dot gnu.org 2012-11-13 10:30:27 UTC --- There are several reasons why RTL LIM cannot currently hoist the (frame) rtx. The first is that in general it stays away from any HARD_REGISTER_P reg with a 10-foot pole. For most hard registers this is probably a good strategy: Anything that's in a real hard register at this point is there for a reason (function return, global reg, whatever) and almost certainly not invariant in a loop. Second, RTL LIM only hoists expression that can be assigned to the original SET_DEST of a single set. In the case of this PR, the insn in case is: ;; UD chains for insn luid 2 uid 15 ;; reg 20 { } ;; reg 63 { d2(bb 4 insn 12) } (insn 15 12 16 4 (set (reg:CCZ 17 flags) (compare:CCZ (reg/v/f:DI 63 [ p ]) (reg/f:DI 20 frame))) PR52285.c:10 8 {*cmpdi_1} (nil)) This fails in may_assign_reg_p because (reg:CCZ 17) can't be assigned to (it is a hard register, and I suppose it has class NO_REGS), so the SET_SRC is not even considered by find_invariant_insn as a potential invariant. I think this condition can be relaxed with something like, Index: loop-invariant.c === --- loop-invariant.c(revision 193454) +++ loop-invariant.c(working copy) @@ -874,11 +874,11 @@ dest = SET_DEST (set); if (!REG_P (dest) - || HARD_REGISTER_P (dest)) + || HARD_REGISTER_P (dest) + || !may_assign_reg_p (dest)) simple = false; - if (!may_assign_reg_p (SET_DEST (set)) - || !check_maybe_invariant (SET_SRC (set))) + if (!check_maybe_invariant (SET_SRC (set))) return; /* If the insn can throw exception, ... Finally, RTL LIM cannot hoist parts of expressions. It only hoists the SET_SRC as a whole, or nothing at all. I have patches for that, originally developed to hoist addresses out of MEMs. I'll dust them off and see if I can make it handle (reg:frame + CONST_INT) and other expressions that involve eliminable regs.
[Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285 --- Comment #11 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-13 10:43:58 UTC --- I think (plus (frame) (const_int)) is likely not an issue (at least in the common case, could be only problem if eliminated into something that needs much bigger offset that doesn't fit into the instruction anymore), the problem is an eliminable register alone used somewhere where (plus (eliminate_to) (const_int small_int)) wouldn't be handled and thus would need to be reloaded. If loops are still around at LRA time, perhaps LRA should consider putting it before loop if register pressure is low, or LIM could just have extra code for this (first handle normal IV motions and just record if there are any eliminable regs not used inside of plus with const_int, and at the end if register pressure still isn't too high consider just creating a new insn that sets a pseudo to (frame) or other eliminable register before loop and replacing all uses of (frame) in the loop with that. I'm not saying it must be LIM, I'm just looking for suggestions where to perform this.
[Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285 --- Comment #12 from Steven Bosscher steven at gcc dot gnu.org 2012-11-13 23:37:52 UTC --- Created attachment 28678 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28678 Gross hack (In reply to comment #11) If loops are still around at LRA time, perhaps LRA should consider putting it before loop if register pressure is low, or LIM could just have extra code for this Unfortunately, loop are destroyed _just_ before LRA, at the end of IRA. IRA has its own loop tree but that is destroyed before LRA, too. I'm not saying it must be LIM, I'm just looking for suggestions where to perform this. LIM may be too early. I've experimented with the attached patch (based off some other patch for invariant addresses that was bit-rotting on a shelf) and I had to resort to some crude hacks to make loop-invariant even just consider moving the bare frame_pointer_rtx, like manually setting the cost to something high because set_src_cost(frame_pointer_rtx)==0. The result is this code: foo: leaq-72(%rsp), %rcx leaq-8(%rsp), %rdx // A Pyrrhic victory... .p2align 4,,10 .p2align 3 .L5: movq%rcx, %rax .p2align 4,,10 .p2align 3 .L3: movb$0, (%rax) addq$1, %rax cmpq%rdx, %rax jne .L3 subl$64, %edi testl %edi, %edi jg .L5 rep ret Need to think about this a bit more, perhaps postreload-gcse can be used for this instead of LIM...
[Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Target Milestone|4.7.2 |4.7.3 --- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org 2012-09-20 10:18:54 UTC --- GCC 4.7.2 has been released.
[Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|4.7.1 |4.7.2 --- Comment #8 from Richard Guenther rguenth at gcc dot gnu.org 2012-06-14 08:29:16 UTC --- GCC 4.7.1 is being released, adjusting target milestone.
[Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|4.7.0 |4.7.1 --- Comment #7 from Richard Guenther rguenth at gcc dot gnu.org 2012-03-22 08:26:47 UTC --- GCC 4.7.0 is being released, adjusting target milestone.