The current level of Intel/AMD hardware uses the
SSE3 extensions, which builds on SSE2 and, before
that, MMX.

To see what a huge advance these are, you have to know what
they replace.  Remember the 8087 numeric coprocessor?
It sat off on the side of the 8086 and snooped the bus,
catching data as it came along and doing numeric operations
when told to.  To minimize the number of signals needed
between the 8086 and the 8087, the 8087 used a stack
architecture: 8 internal registers configured as a loop-
around stack, with the top-of-stack being a preferred
operand.

Well, that's what the Pentium has to this very day, if
SSE is not used.  The interface sucks, mostly because
the stack organization makes it pretty much impossible for
a compiler to assign variables to registers for any but
the shortest-term uses.  Heck, even with hand-code it's tough
to use the registers efficiently.  On top of that, it's
a one-operation-at-a-time instruction set.

SSE3 is better because it has (1) a real register file
that you can assign variables to; (2) a 128-bit interface and
two floating-point units.

How much you will benefit from SSE3 will depend on the
extent to which your operands are in cache.  If your
operands have to be fetched from uncached memory, the
8087 instruction set, slow as it is, will almost keep
up with the data (the Pentium takes heroic measures to
keep data moving, including prefetch on up to four separate
vector operands, once it sniffs out that you are working
on vectors).

But if your operands are cached, or if by clever coding you
can keep an operand in cache, the SSE3 will outperform the
8087: my pencil calculations indicate by up to a factor of 4x,
though I haven't verified this on real code.

So, I think that turning on SSE3 instructions will give a modest
performance improvement on big arrays, and a bigger improvement
on small operands that are usually hanging around in cache:
2x or better on the numeric part of short vector + short vector,
which will be a lesser overall improvement after overheads are
taken into account.

I think that Jsoftware has taken the view so far that this
level of improvement is not worth the hassle of having to
support the extra builds.

Certain operations, notably matrix multiply, could get a
bigger improvement factor than 2x, by using the SSE3 instructions
in hand-tweaked code that maximizes cache usage.

Henry Rich
 

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Miller, Raul D
> Sent: Monday, October 30, 2006 11:24 AM
> To: General forum
> Subject: RE: [Jgeneral] J 6.01 on Intel Mac
> 
> Alistair Tucker wrote:
> >     I have read that Core2's specialised vector processor (SSE)
> > runs at fully twice the speed of the Core's.  Presumably that
> > means that an array-centric application like J will run twice
> > as fast?
> 
> Unlikely.
> 
> SSE's optimizations work around bandwidth limitations in the CPU
> (such as would hit you in the context of large arrays) in a
> fashion which is most useful when dealing with small fix-sized
> arrays.
> 
> SSE would probably be useful in the context of some of J's
> "special code" -- special case algorithms which take advantage
> of the restrictions implied by certain sequences of operations.
> But I doubt very much that SSE would be much use for J's core
> operations.
> 
> On top of that, SSE does not work on all PCs.  This means that
> if SSE were used, we'd either see a different executable which
> supports SSE, or J would be larger (as it would need to incorporate
> an additional non-SSE implementation of every routine which
> supports SSE).
> 
> Finally, if the ISI folks had gone to the effort of providing
> "special SSE" code, I think it would be documented in the
> release notes.
> 
> -- 
> Raul
> 
> 
> ----------------------------------------------------------------------
> For information about J forums see 
> http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to