dsimcha wrote:
== Quote from Don ([email protected])'s article
Of course, in the occasions when SSE lets you do 4 operations at once,
you get nearly a 4X speedup...

Is SSE(2) inherently faster then (at least in real-world implementations) than
x87, even when you don't vectorize?

No. (Except on Pentium 4, where SSE was basically the only part of the CPU that wasn't crippled).

 Would I be able to expect any speedup from
going from x87 to SSE(2) for code that has a decent amount of implicit 
instruction
level parallelism but wasn't explicitly vectorized either by me or the compiler?

I doubt it. The only time that you get an easy benefit is when you have a mix of serial and parallel calculations.

float[4] x, y;

float z = some_calculation;
x[] += z*y[];

If you're using SSE for all your calculations, z will already be in an SSE register, so it makes setting up the parallel calculation a bit quicker.

And the compiler might be better at scheduling SSE code, than x87. But that's not really a processor thing.

Reply via email to