On Tue, Nov 12, 2002 at 08:23:50 -0800, Bob Colwell wrote: > >Yes, you have to specify the use of sse explicity (I think I meantioned it > >on IRC when we were benchmarking). It appeared to make zero difference on > >the athlon, but I didn't check the assemler to see exactly what it was > >doing. I've heard that just using sse instructions instead of 387 on the > >P4 is quicker, but I've not tried it. Gcc will do that if you specify -msse > > The sse instructions ought to be substantially faster. There are many more > registers available to support the flops, and they aren't organized into the > ridiculous 387 stack, so they're easier to reach. I believe they also > default > to round-to-nearest and flush-denormals, but if you care about such niceties > you should check.
Yes, that is correct AKAIK. However the particular benchmark we are talking about has a lot of memory access in it, and in that case it didn't make any difference (or gcc was doing something stupid, I didn't check). I dont understand processor issues well enough to know what the bottleneck would be, but it doesn't appear to be the maths instrucitons. I will check the effect of sse on my plugins as they are generally less ram hungry, but I dont have a gcc3 machine around at the moment. - Steve
