------- Comment #6 from pinskia at gmail dot com  2007-03-19 17:52 -------
Subject: Re:  pseudo-optimzation with sincos/cexpi

On 19 Mar 2007 12:43:49 -0000, dominiq at lps dot ens dot fr
<[EMAIL PROTECTED]> wrote:
>
> Since sin() and cos() are non trivial functions, I am very surprised
> that a wrong API makes a 50% difference.

Well Here is how it can make a 50% difference (at least on the Cell,
the 970 has less of a restriction and only the dispatch group is
rejected).  Modern PowerPC processors like not to store stuff to the
stack and then load it again with in a number of cycles (cell is
around 50 cycles while the 970 is just within a dispatch group).
Transfering between the integer register set and the floating point
register set can only be done via memory so you will get a LHS or a
LRU reject (depending on what processor you are on).  This can either
cause a 50 cycle delay or reject of the dispatch group (the later can
cause multiple rejects).  The number of cycles used up by this issue
can add up with both sides of the function having this hazard.

Thanks,
Andrew Pinski


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249

Reply via email to