Olivier Crete <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Fri, 15 Sep 2006 13:18:49 -0400:
> On Fri, 2006-15-09 at 10:08 -0700, [EMAIL PROTECTED] wrote: >> On Fri, Sep 15, 2006 at 04:47:14PM +0000, Duncan wrote: >> >> > I'm unclear as to what "vectorization" means as used here. My >> > understanding of "vector" is as a synonym for "line", thus implying >> > loop unrolling of some form or another, which will increase size. >> > >> > I am however aware that vectorization has a somewhat different >> > meaning in programming terms than the above, but am not sufficiently >> > educated on the topic to make an informed choice[.] >> > >> > If you can sufficiently explain the concept to me such that I >> > understand enough about it to feel comfortable going with other than >> > the default (which means I can explain why I chose it and why it >> > won't interfere with my overall strategy as outlined in the >> > grandparent, or is worth it even if it does), I'd be very grateful! >> >> Back in the day, vectorization was, I believe, a supercomputer SIMD >> (single instruction multiple data) concept, where instruction operands >> were pointers to data, so it would, for instance, add two arrays of >> numbers to produce a third array. Isn't this what the Altivec >> instructions do? > > That's exactly what it means. On x86/amd64 some MMX, SSE/SSE2 and 3dnow > operations are SIMD operations. Vectorizing a loop means that if you try > to add two tables of lets say 12000 elements, instead of doing the loops > 12000 times for 1 element each time, it will do the loop lets say 3000 > times with 4 element each time. Which should be faster... (but isn't > always depending if the vector ops have been implemented properly). I was somewhat aware of that, but hadn't considered the effect on loops, and don't understand it enough to be able explain it as you did, nor enough to grok why if it's so much more efficient, gcc doesn't do it by default at least on archs sufficiently specified to know the instructions are there and that it makes sense. (amd64 being new enough not to have all the different generations of mmx/sse/sse2/etc/etc it should make sense, as it would on x86 with -march=pentium4 or whatever, so it knows what vectorization levels are available as opposed to plain pentium, which was pre-mmx, let alone the later vectorization functions.) IOW, that explains why it should be more efficient, but not why gcc isn't already doing it on amd64, or maybe it is, and specifying the flag would be redundant? This is precisely what I mean when I say I don't have enough information to make a defensible decision, so I've chosen to stick with the safe defaults. If it's not being done by default, there's likely a good reason somewhere, and lacking enough information to make an informed decision, the defaults are the safe way to go. This is also one of those places where the manpage is frustratingly uninformative. The explanation on -ftree-vect-loop-version explains that it's enabled by default, that both vectorized and unvectorized versions of loops are created where compile-time can't tell for sure that vectorizing is possible, /except/ for -Os. Since this flag forces double-code in some cases, disabling it for -Os makes perfect sense so no problem there. The problem is that this implies that where it /can/ tell vectorization is possible, it should be doing that by default as well -- only it never /says/ it does it by default, neither under the regular -ftree-vectorize description, nor under the lists of what gets enabled by default at the various -OX levels. The documentation therefore leaves the answer to the question of whether it's enabled by default very much up in the air, implying it is in the description of something else, but nowhere stating explicitly one way or the other. Another example of unclearly specifying the default is -ftree-pre. It's certainly the default for -O2 and -O3, and the section on -Os doesn't say it's disabled there while saying all the -O2 except where that would increase the size, and there's no direct indication this increases size, but the description for -ftree-pre specifies -O2 and -O3 specifically only, so one is left wondering what side of -O2 except where that would increase size it falls on, and why. As you can see, I've chosen to include it in my CFLAGS because it seems like it should be of benefit (compare -ftree-fre, enabled at -O and higher including -Os) and shouldn't increase size /too/ much, just in case it's /not/ default for -Os for some reason. With -ftree-pre I can be pretty sure it's safe to include since -O2 is known to include it, but -ftree-vectorize is different as there's nothing saying /where/ it's the default (if anywhere), tho as I explained it's implied as the default by the description for the -ftree-vect-loop-version entry. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- [email protected] mailing list
