Let me elaborate a bit more:
Lets use PARSEC Blackscholes as an example. This benchmark uses the floating
version of exp() math function heavily.
Under the glibc/math/s_cexpf.c the floating point version of exp calls
the function:
if (rcls >= FP_ZERO)
{
/* Real part is finite. */
if (icls >= FP_ZERO)
{
/* Imaginary part is finite. */
float exp_val = __ieee754_expf (__real__ x);
float sinix, cosix;
__ieee754_expf which is used to calculate e^x. Each time this function is
called 3 system calls are made, which causes us to spend 50% of our
execution time in the kernel, we are paying a heavy penalty (
http://www.helsinki.fi/atk/unix/dec_manuals/DOC_51/HTML/MAN/MAN3/0388____.HTM
).
ieee754_expf is under the file sysdeps/ieee754/flt-32/e_expf.c:
static const float THREEp22 = 12582912.0;
/* 1/ln(2). */
#undef M_1_LN2
static const float M_1_LN2 = 1.44269502163f;
/* ln(2) */
#undef M_LN2
static const double M_LN2 = .6931471805599452862;
int tval;
double x22, t, result, dx;
float n, delta;
union ieee754_double ex2_u;
fenv_t oldenv;
feholdexcept (&oldenv);* <--Two System Calls here*
#ifdef FE_TONEAREST
fesetround (FE_TONEAREST);
#endif
/* Calculate n. */
n = x * M_1_LN2 + THREEp22;
n -= THREEp22;
dx = x - n*M_LN2;
......
/* Return result. */*
fesetenv (&oldenv); <--System Call Here*
result = x22 * ex2_u.d + ex2_u.d;
return (float) result;
}
First due to the ieee standard the value of the FCPR is preserved (two
systems calls, copy old value then set to 0), then once exponent is
calculated the old value is put back (another system call).
____________
Anyone have any ideas on solving this solution? im looking for a long term
result for parsec benchmarks, one solution that might work is simply
removing the saving and restoring of the FCPR which is just erasing those
system calls. Anyone know if this is a bad idea? It might work for
blackscholes but im worried about long term results on this solution.
Thanks,
EF
On Mon, Mar 29, 2010 at 3:03 PM, ef <[email protected]> wrote:
> Is there any particular reason why IEEE floating point is integrated in the
> cross compiler on the M5 website. It seems to really kill performance, since
> software is responsible for being ieee compliant.
>
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users