Olivier Crete <[EMAIL PROTECTED]> posted
[EMAIL PROTECTED], excerpted below, on Fri,
15 Sep 2006 13:18:49 -0400:

> On Fri, 2006-15-09 at 10:08 -0700, [EMAIL PROTECTED] wrote:
>> On Fri, Sep 15, 2006 at 04:47:14PM +0000, Duncan wrote:
>> 
>> > I'm unclear as to what "vectorization" means as used here.  My
>> > understanding of "vector" is as a synonym for "line", thus implying
>> > loop unrolling of some form or another, which will increase size.
>> > 
>> > I am however aware that vectorization has a somewhat different
>> > meaning in programming terms than the above, but am not sufficiently
>> > educated on the topic to make an informed choice[.]
>> > 
>> > If you can sufficiently explain the concept to me such that I
>> > understand enough about it to feel comfortable going with other than
>> > the default (which means I can explain why I chose it and why it
>> > won't interfere with my overall strategy as outlined in the
>> > grandparent, or is worth it even if it does), I'd be very grateful!
>> 
>> Back in the day, vectorization was, I believe, a supercomputer SIMD
>> (single instruction multiple data) concept, where instruction operands
>> were pointers to data, so it would, for instance, add two arrays of
>> numbers to produce a third array.  Isn't this what the Altivec
>> instructions do?
> 
> That's exactly what it means. On x86/amd64 some MMX, SSE/SSE2 and 3dnow
> operations are SIMD operations. Vectorizing a loop means that if you try
> to add two tables of lets say 12000 elements, instead of doing the loops
> 12000 times for 1 element each time, it will do the loop lets say 3000
> times with 4 element each time. Which should be faster... (but isn't
> always depending if the vector ops have been implemented properly).

I was somewhat aware of that, but hadn't considered the effect on loops,
and don't understand it enough to be able explain it as you did, nor enough
to grok why if it's so much more efficient, gcc doesn't do it by default
at least on archs sufficiently specified to know the instructions are
there and that it makes sense. (amd64 being new enough not to have all the
different generations of mmx/sse/sse2/etc/etc it should make sense, as it
would on x86 with -march=pentium4 or whatever, so it knows what
vectorization levels are available as opposed to plain pentium, which was
pre-mmx, let alone the later vectorization functions.)

IOW, that explains why it should be more efficient, but not why gcc isn't
already doing it on amd64, or maybe it is, and specifying the flag would
be redundant?  This is precisely what I mean when I say I don't have
enough information to make a defensible decision, so I've chosen to stick
with the safe defaults.  If it's not being done by default, there's likely
a good reason somewhere, and lacking enough information to make an
informed decision, the defaults are the safe way to go.

This is also one of those places where the manpage is frustratingly
uninformative.  The explanation on -ftree-vect-loop-version explains that
it's enabled by default, that both vectorized and unvectorized versions of
loops are created where compile-time can't tell for sure that vectorizing
is possible, /except/ for -Os.  Since this flag forces double-code in some
cases, disabling it for -Os makes perfect sense so no problem there.  The
problem is that this implies that where it /can/ tell vectorization is
possible, it should be doing that by default as well -- only it never
/says/ it does it by default, neither under the regular -ftree-vectorize
description, nor under the lists of what gets enabled by default at the
various -OX levels.  The documentation therefore leaves the answer to the
question of whether it's enabled by default very much up in the air,
implying it is in the description of something else, but nowhere stating
explicitly one way or the other.

Another example of unclearly specifying the default is -ftree-pre.  It's
certainly the default for -O2 and -O3, and the section on -Os doesn't say
it's disabled there while saying all the -O2 except where that would
increase the size, and there's no direct indication this increases size,
but the description for -ftree-pre specifies -O2 and -O3 specifically
only, so one is left wondering what side of -O2 except where that would
increase size it falls on, and why.  As you can see, I've chosen to
include it in my CFLAGS because it seems like it should be of benefit
(compare -ftree-fre, enabled at -O and higher including -Os) and shouldn't
increase size /too/ much, just in case it's /not/ default for -Os for some
reason.  

With -ftree-pre I can be pretty sure it's safe to include since -O2 is
known to include it, but -ftree-vectorize is different as there's nothing
saying /where/ it's the default (if anywhere), tho as I explained it's
implied as the default by the description for the -ftree-vect-loop-version
entry.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

-- 
[email protected] mailing list

Reply via email to