[Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown

2012-11-13 Thread steven at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285



Steven Bosscher steven at gcc dot gnu.org changed:



   What|Removed |Added



 Status|NEW |ASSIGNED

 CC|steven at gcc dot gnu.org   |

 AssignedTo|unassigned at gcc dot   |steven at gcc dot gnu.org

   |gnu.org |



--- Comment #10 from Steven Bosscher steven at gcc dot gnu.org 2012-11-13 
10:30:27 UTC ---

There are several reasons why RTL LIM cannot currently hoist the (frame) rtx.





The first is that in general it stays away from any HARD_REGISTER_P reg with

a 10-foot pole.  For most hard registers this is probably a good strategy:

Anything that's in a real hard register at this point is there for a reason

(function return, global reg, whatever) and almost certainly not invariant in

a loop.





Second, RTL LIM only hoists expression that can be assigned to the original

SET_DEST of a single set. In the case of this PR, the insn in case is:



;;   UD chains for insn luid 2 uid 15

;;  reg 20 { }

;;  reg 63 { d2(bb 4 insn 12) }

(insn 15 12 16 4 (set (reg:CCZ 17 flags)

(compare:CCZ (reg/v/f:DI 63 [ p ])

(reg/f:DI 20 frame))) PR52285.c:10 8 {*cmpdi_1}

 (nil))



This fails in may_assign_reg_p because (reg:CCZ 17) can't be assigned to (it

is a hard register, and I suppose it has class NO_REGS), so the SET_SRC is

not even considered by find_invariant_insn as a potential invariant.  I think

this condition can be relaxed with something like,



Index: loop-invariant.c

===

--- loop-invariant.c(revision 193454)

+++ loop-invariant.c(working copy)

@@ -874,11 +874,11 @@

   dest = SET_DEST (set);



   if (!REG_P (dest)

-  || HARD_REGISTER_P (dest))

+  || HARD_REGISTER_P (dest)

+  || !may_assign_reg_p (dest))

 simple = false;



-  if (!may_assign_reg_p (SET_DEST (set))

-  || !check_maybe_invariant (SET_SRC (set)))

+  if (!check_maybe_invariant (SET_SRC (set)))

 return;



   /* If the insn can throw exception, ...





Finally, RTL LIM cannot hoist parts of expressions.  It only hoists the

SET_SRC as a whole, or nothing at all.  I have patches for that, originally

developed to hoist addresses out of MEMs.  I'll dust them off and see if

I can make it handle (reg:frame + CONST_INT) and other expressions that 

involve eliminable regs.


[Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown

2012-11-13 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285



--- Comment #11 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-13 
10:43:58 UTC ---

I think (plus (frame) (const_int)) is likely not an issue (at least in the

common case, could be only problem if eliminated into something that needs much

bigger offset that doesn't fit into the instruction anymore), the problem is an

eliminable register alone used somewhere where (plus (eliminate_to) (const_int

small_int)) wouldn't be handled and thus would need to be reloaded.

If loops are still around at LRA time, perhaps LRA should consider putting it

before loop if register pressure is low, or LIM could just have extra code for

this (first handle normal IV motions and just record if there are any

eliminable regs not used inside of plus with const_int, and at the end if

register pressure still isn't too high consider just creating a new insn that

sets a pseudo to (frame) or other eliminable register before loop and replacing

all uses of (frame) in the loop with that.  I'm not saying it must be LIM, I'm

just looking for suggestions where to perform this.


[Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown

2012-11-13 Thread steven at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285



--- Comment #12 from Steven Bosscher steven at gcc dot gnu.org 2012-11-13 
23:37:52 UTC ---

Created attachment 28678

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28678

Gross hack



(In reply to comment #11)

 If loops are still around at LRA time, perhaps LRA should consider putting

 it before loop if register pressure is low, or LIM could just have extra

 code for this



Unfortunately, loop are destroyed _just_ before LRA, at the end of IRA.

IRA has its own loop tree but that is destroyed before LRA, too.





 I'm not saying it must be LIM, I'm

 just looking for suggestions where to perform this.



LIM may be too early. I've experimented with the attached patch (based off

some other patch for invariant addresses that was bit-rotting on a shelf)

and I had to resort to some crude hacks to make loop-invariant even just

consider moving the bare frame_pointer_rtx, like manually setting the cost

to something high because set_src_cost(frame_pointer_rtx)==0.  The result

is this code:



foo:

leaq-72(%rsp), %rcx

leaq-8(%rsp), %rdx // A Pyrrhic victory...

.p2align 4,,10

.p2align 3

.L5:

movq%rcx, %rax

.p2align 4,,10

.p2align 3

.L3:

movb$0, (%rax)

addq$1, %rax

cmpq%rdx, %rax

jne .L3

subl$64, %edi

testl   %edi, %edi

jg  .L5

rep ret





Need to think about this a bit more, perhaps postreload-gcse can be used

for this instead of LIM...


[Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown

2012-09-20 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285



Jakub Jelinek jakub at gcc dot gnu.org changed:



   What|Removed |Added



   Target Milestone|4.7.2   |4.7.3



--- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org 2012-09-20 
10:18:54 UTC ---

GCC 4.7.2 has been released.


[Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown

2012-06-14 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285

Richard Guenther rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|4.7.1   |4.7.2

--- Comment #8 from Richard Guenther rguenth at gcc dot gnu.org 2012-06-14 
08:29:16 UTC ---
GCC 4.7.1 is being released, adjusting target milestone.


[Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown

2012-03-22 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285

Richard Guenther rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|4.7.0   |4.7.1

--- Comment #7 from Richard Guenther rguenth at gcc dot gnu.org 2012-03-22 
08:26:47 UTC ---
GCC 4.7.0 is being released, adjusting target milestone.