> > 1) The definition of the realignment instruction doesn't match hardware for > instrution sets like ARM WMMX, where aligned amounts shift by 0 bytes > instead of VECSIZE byes. This makes it useless for vector realignment, > because in the case that the pointer happens to be aligned, we get the > wrong vector. Looks like the SPARC realignment hook does the same thing... > Indeed, it looks like Altivec is the only one to support it, and they do > some trickery with shifting the wrong (against endianness) way based on the > two's compliment of the source (a very clever trick). No other machine > (evidentally) can easily meet the description of the current realignment > mechanism. >
Indeed on altivec we implement the 'mask_for_load(addr)' builtin using 'lvsr(neg(addr))', that feeds the 'realign_load' (which is a 'vperm' on altivec). I'm not too familiar with the ARM WMMX ISA, but couldn't you use a similar trick - i.e instead of using the low bits of the address for the shift amount that feeds the realign_load, use shift=(VECSIZE - neg(addr))? I think this should give shift amount VECSIZE for the aligned case (and hopefully the correct shift amounts for the unaligned cases). dorit