------- Comment #7 from rsandifo at gcc dot gnu dot org 2006-06-06 08:54 ------- Based on David's descripion, a reduced testcase appears to be:
static short f[100]; int bar (void) { return f[0]; } void foo (void) { int i; for (i = 0; i < 100; i++) f[i]++; } Looking at the assembly output of "-O2 -ftree-vectorize -maltivec -mabi=altivec", it seems that "f" will only be guaranteed 2-byte alignment with -fsection-anchors. Without -fno-section-anchors, "f" gets the expected 16-byte alignment. This is an ordering problem. gcc is compiling bar() first, and generating code on the assumption that "f" has natural alignment. The vectoriser then increases the alignment of "f", which throws off any layout based on the original natural alignment. If bar() is compiled first, then gcc really does need to be able to place "f" at a fixed offset in its section, so that it can use section anchors to access "f". So I think the possible fixes are: (1) Don't use section anchors for "f" in bar() (2) Don't increase the alignment of "f" in foo() (3) Increase the alignment of "f" before compiling either foo() or bar() (1) implies either (1a) not using section anchors for vectorisable variables or (1b) disabling -fsection-anchors when -ftree-vectorize is in effect. (2) implies either (2a) not increasing the alignment of variables that have already been assigned a block offset or (2b) preventing -ftree-vectorize from increasing alignment when -fsection-anchors is in effect. (3) implies increasing the alignment of all vectorisable variables if both -fsection-anchors and -ftree-vectorize are in effect. Neither (2a) nor (2b) is acceptable IMO. (I don't think (2a) is acceptable because the order of compilation is not guaranteed.) (1) is a worst-case fall-back position, with (1a) obviously being better than (1b). (3) seems more appealing, but only if we accept that -fsection-anchors -ftree-vectorize may increase the alignment of variables that do not in fact get vectorised. This is going to be a data size hit. (Hopefully it will only be a small hit, and I suppose -ftree-vectorize is already a "speed over size" optimisation.) If we choose (1) or (3), I suppose we should also add a gcc_assert() that the vectoriser is not increasing the alignment of a variable that has already been placed in a block (i.e. assert that (2a) would then be a no-op). Richard -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27770