[gentoo-amd64] vectorization (was: gcc4 CFLAGS)

John S. Yates, Jr. Mon, 18 Sep 2006 17:57:21 -0700

On Fri, 15 Sep 2006 16:47:14 +0000 (UTC), you wrote:

> I am however aware that vectorization has a somewhat different meaning in
> programming terms than the above, but am not sufficiently educated on the
> topic to make an informed choice, so I've simply left gcc to go with its
> default choice given my overall stated intention of -Os.


Older super-computers, especially those designed or inspired by Seymour Cray
Cray, included "vector registers".  These were multiple registers (typically
a small power of 2, say 32, 64, or 128) that could be manipulated as a unit.
This was an earlier form of SIMD.  By issuing a single instruction -- such as
a vector load or vector add -- you repeated the same operation on a sequence
of operands.  The crucial difference was that these vector operations had some
start-up overhead and then ran autonomously delivering one result every clock
tick for the length of the vector register.

While some compilers added proprietary language extension to support vector
values as actual data type, most numeric code was written in scalar form.
To make such super-computers useful it was crucial that compilers be able to
recognize when a scalar loop could be implemented using the machine's vector
facilities.  Fundamentally this came down to figuring out when successive
loop iterations were independent and hence could execute in parallel.  Since
the compiler was attempting to re-express scalar loops as loops using vector
primitives the optimization became know as "vectorization".

In essence a multi-media SIMD mechanism is very similar.  A 64 bit register
containing 4 16-bit operands is essentially a length 4 "vector register".
Finding opportunities to use such SIMD instructions in scalar code requires
exactly the same forms of analysis and optimization.

/john


-- 
[email protected] mailing list

[gentoo-amd64] vectorization (was: gcc4 CFLAGS)

Reply via email to