Re: Notes from tinkering with the autovectorizer (4.1.1)

Erich Plondke Wed, 27 Sep 2006 08:18:04 -0700

Dorit Nuzman wrote:

Indeed on altivec we implement the 'mask_for_load(addr)' builtin using
'lvsr(neg(addr))', that feeds the 'realign_load' (which is a 'vperm' on
altivec).
I'm not too familiar with the ARM WMMX ISA, but couldn't you use a similar
trick - i.e instead of using the low bits of the address for the shift
amount that feeds the realign_load, use shift=(VECSIZE - neg(addr))? I
think this should give shift amount VECSIZE for the aligned case (and
hopefully the correct shift amounts for the unaligned cases).


On Altivec, which on all targets is apparently big endian, you would think you
would want to shift elements left (lower addresses, more significant)
in order to
align them.

Instead we shift right (higher addresses / less significant) the
negative amount to
be able to get the behavior the hook wants:
0 --> 0   (get more significant vector)
1 --> 15
2 --> 14
...
15 --> 1

This works because Altivec can shift either way arbitrarily.

But on WMMX, which is little endian only, we only have an instruction
to shift towards
lower addresses.  This is of course the behavior you would expect on
first glance; to
obtain an aligned vector you:
  and r_floor,r,#-8
  wldrd wr0,[r_floor]
  wldrd wr1,[r_floor+#8]
  walignr w2,w0,w1,r      /* The "r" in the mnemonic is for "register" */

There is no align going the other way, because it would be strange,
and (seemingly
for the architects I guess) unnecessary if you are only ever little endian.

Indeed, in your paper (grin) "Multi-platform Auto-vectorization" you

       define the functionality of realign load in terms of mis - the
misalignment
       of the address (i.e., address&(VS)), as follows: The last
VS-mis bytes of
       vector vec1 are concatenated to the first mis bytes of the vector vec2.

This is what the walign instruction does, but it's not quite what we
ended up with in GCC.
In the case that mis is 0, the GCC hook wants to end up with vec2, not vec1.

So for architectures that can align both ways, the current method is
fine, but if the
architecture is designed for one endian only we are going to have
trouble exploiting
the alignment feature.

Thanks,

   Erich

--
Why are ``tolerant'' people so intolerant of intolerant people?

Re: Notes from tinkering with the autovectorizer (4.1.1)

Reply via email to