Richard Biener <richard.guent...@gmail.com> writes: > On Fri, Jan 26, 2018 at 3:21 PM, Richard Sandiford > <richard.sandif...@linaro.org> wrote: >> I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks >> the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered >> lanes. This meant that both the SVE patterns and the handling of >> fully-masked loops were wrong. >> >> The patch deals with that by making sure that all vec_unpack* optabs >> are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate >> define_insn. This in turn meant that we can get rid of the duplication >> between the signed and unsigned patterns for predicates. (We provide >> implementations of both the signed and unsigned optabs because the sign >> doesn't matter for predicates: every element contains only one >> significant bit.) >> >> Also, the float unpacks need to unpack one half of the input vector, >> but the unpacked upper bits are "don't care". There are two obvious >> ways of handling that: use an unpack (filling with zeros) or use a ZIP >> (filling with a duplicate of the low bits). The code previously used >> unpacks, but the sequence involved a subreg that is semantically an >> element reverse on big-endian targets. Using the ZIP patterns avoids >> that, and at the moment there's no reason to prefer one over the other >> for performance reasons, so the patch switches to ZIP unconditionally. >> As the comment says, it would be easy to optimise this later if UUNPK >> turns out to be better for some implementations. >> >> Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu. >> I think the tree-vect-loop-manip.c part is obvious, but OK for the >> AArch64 bits? >> >> Richard >> >> >> 2018-01-26 Richard Sandiford <richard.sandif...@linaro.org> >> >> gcc/ >> * tree-vect-loop-manip.c (vect_maybe_permute_loop_masks): >> Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR >> for big-endian. >> * config/aarch64/iterators.md (hi_lanes_optab): New int attribute. >> * config/aarch64/aarch64-sve.md >> (*aarch64_sve_<perm_insn><perm_hilo><mode>): Rename to... >> (aarch64_sve_<perm_insn><perm_hilo><mode>): ...this. >> (*extend<mode><Vwide>2): Rename to... >> (aarch64_sve_extend<mode><Vwide>2): ...this. >> (vec_unpack<su>_<perm_hilo>_<mode>): Turn into a define_expand, >> renaming the old pattern to... >> (aarch64_sve_punpk<perm_hilo>_<mode>): ...this. Only define >> unsigned packs. >> (vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>): Turn into a >> define_expand, renaming the old pattern to... >> (aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): ...this. >> (*vec_unpacku_<perm_hilo>_<mode>_no_convert): Delete. >> (vec_unpacks_<perm_hilo>_<mode>): Take BYTES_BIG_ENDIAN into >> account when deciding which SVE instruction the optab should use. >> (vec_unpack<su_optab>_float_<perm_hilo>_vnx4si): Likewise. >> >> gcc/testsuite/ >> * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather >> than unpacks. >> * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise. >> * gcc.target/aarch64/sve/unpack_float_1.c: Likewise. >> >> Index: gcc/tree-vect-loop-manip.c >> =================================================================== >> --- gcc/tree-vect-loop-manip.c 2018-01-13 18:02:00.948360274 +0000 >> +++ gcc/tree-vect-loop-manip.c 2018-01-26 14:00:15.717916957 +0000 >> @@ -334,7 +334,8 @@ vect_maybe_permute_loop_masks (gimple_se >> { >> tree src = src_rgm->masks[i / 2]; >> tree dest = dest_rgm->masks[i]; >> - tree_code code = (i & 1 ? VEC_UNPACK_HI_EXPR >> + tree_code code = ((i & 1) == (BYTES_BIG_ENDIAN ? 0 : 1) >> + ? VEC_UNPACK_HI_EXPR >> : VEC_UNPACK_LO_EXPR); > > This looks broken anyways -- the element oder on GIMPLE is the same > for little and big-endian. I see two other BYTES_BIG_ENDIAN uses in > tree-vect-*, not sure when they crept in but they are all broken.
Are you sure? Everything apart from this new code seems consistently to treat UNPACK_HI as "high part of the register" rather than "high memory address/lane number". I think changing it now would break powerpc64 and mips. Thanks, Richard