On Sep 23, 2005, at 5:30 PM, Ilya Lipovsky wrote:
Hello all,
I am asking this, because we're having some problems with those
builtins inlining instructions properly when a certain level of logic
complexity (in loops) arises. Even worse, gcc 4.0 (both 4.0.0 and
4.0.1) generates bad code (whereas gcc 3.4 is OK). 3.4 simply resorts
to a series of calls to compiler generated routines (with mangled
names such as "_Z7vec_addU8_vectorfs" probably corresponding to the
vec_add's builtin) instead of inlining actual instructions. Again that
happens when a certain level of code mass is reached. gcc 4.0 tries to
do the same but, apparently, something goes wrong. I didn't inspect
the produced assembly code in depth. Currently, I can only give
examples in terms of the macstl expressions and compiler generated
assembly output, but if you request, I can try to write a more direct
loop that uses the vec_* code.
As always read http://gcc.gnu.org/bugs.html and file a bug. The
inlining problem
have been fixed for 4.1.0, by the stuff mentioned below. Also smaller
the testcase
the better
So does anyone of you, compiler writers, know why the builtins are
needed?
The builtins are required so the compiler can schedule the code
correctly.
In 4.1.0, the altivec.h header have been rewritten so that we don't use
inline
functions but some builtins directly.
Also, does anyone care that this stuff doesn't really work?
Well there was a rewrite for 4.1.0 for the altivec.h header which
should improve
this. And also in both 4.0.0 and 4.1.0, there is autovectorization
which should
expose more bugs. Also sometime during either 3.4.0, 4.0.0, or 4.1.0,
an altivec
testsuite was addded.
Thanks,
Andrew Pinski