On 5/25/06, Jon Smirl <[EMAIL PROTECTED]> wrote:
I ran into another snag that taking the alternative return on a P4 has
really bad performance impacts since it messes up prefetch. This
sequence is the killer.

       addl    $4, %esp
       ret                                     /* Return to error return */

I can try coding this as a parameter and see how the compiler
generates code differently.

           jmp   *4($esp)

This is slightly faster than addl, ret.

But my micro scale benchmarks are extremely influenced by changes in
branch prediction. I still wonder how this would perform in large
programs.

It seems that the sequence

  ret
  test
  jne

is very fast compared to

  jmp   *4($esp)

Even when they both end up at the same place. It looks to me like the
call stack predictor is controlling everything.  The only way to make
this work would be to figure out some way to get the alternative
return address into the call stack predictor.


--
Jon Smirl
[EMAIL PROTECTED]

Reply via email to