On Fri, Aug 7, 2020 at 10:33 AM Marc Glisse <marc.gli...@inria.fr> wrote: > > On Fri, 7 Aug 2020, Richard Biener wrote: > > > On Thu, Aug 6, 2020 at 8:07 PM Marc Glisse <marc.gli...@inria.fr> wrote: > >> > >> On Thu, 6 Aug 2020, Christophe Lyon wrote: > >> > >>>> Was I on the right track configuring with > >>>> --target=arm-none-linux-gnueabihf --with-cpu=cortex-a9 > >>>> --with-fpu=neon-fp16 > >>>> then compiling without any special option? > >>> > >>> Maybe you also need --with-float=hard, I don't remember if it's > >>> implied by the 'hf' target suffix > >> > >> Thanks! That's what I was missing to reproduce the issue. Now I can > >> reproduce it with just > >> > >> typedef unsigned int vec __attribute__((vector_size(16))); > >> typedef int vi __attribute__((vector_size(16))); > >> vi f(vec a,vec b){ > >> return a==5 | b==7; > >> } > >> > >> with -fdisable-tree-forwprop1 -fdisable-tree-forwprop2 > >> -fdisable-tree-forwprop3 -O1 > >> > >> _1 = a_5(D) == { 5, 5, 5, 5 }; > >> _3 = b_6(D) == { 7, 7, 7, 7 }; > >> _9 = _1 | _3; > >> _7 = .VCOND (_9, { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, > >> 107); > >> > >> we fail to expand the equality comparison (expand_vec_cmp_expr_p returns > >> false), while with -fdisable-tree-forwprop4 we do manage to expand > >> > >> _2 = .VCONDU (a_5(D), { 5, 5, 5, 5 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 > >> }, 112); > >> > >> It doesn't make much sense to me that we can expand the more complicated > >> form and not the simpler form of the same operation (both compare a to 5 > >> and produce a vector of -1 or 0 of the same size), especially when the > >> target has an instruction (vceq) that does just what we want. > >> > >> Introducing boolean vectors was fine, but I think they should be real > >> types, that we can operate on, not be forced to appear only as the first > >> argument of a vcond. > >> > >> I can think of 2 natural ways to improve things: either implement vector > >> comparisons in the ARM backend (possibly by forwarding to their existing > >> code for vcond), or in the generic expansion code try using vcond if the > >> direct comparison opcode is not provided. > >> > >> We can temporarily revert my patch, but I would like it to be temporary. > >> Since aarch64 seems to handle the same code just fine, maybe someone who > >> knows arm could copy the relevant code over? > >> > >> Does my message make sense, do people have comments? > > > > So what complicates things now (and to some extent pre-existed when you > > used AVX512 which _could_ operate on boolean vectors) is that we > > have split out the condition from VEC_COND_EXPR to separate stmts > > but we do not expect backends to be able to code-generate the separate > > form - instead we rely on the ISEL pass to trasform VEC_COND_EXPRs > > to .VCOND[U] "merging" the compares again. Now that process breaks > > down once we have things like _9 = _1 | _3; - at some point I argued > > that we should handle vector compares [and operations on boolean vectors] > > as well in ISEL but then when it came up again for some reason I > > disregarded that again. > > > > Thus - we don't want to go back to fixing up the generic expansion code > > (which looks at one instruction at a time and is restricted by TER > > single-use > > restrictions). > > Here, it would only be handling comparisons left over by ISEL because they > do not feed a VEC_COND_EXPR, maybe not so bad. Handling it in ISEL would > be more consistent, but then we may have to teach the vector lowering pass > about this. > > > Instead we want to deal with this in ISEL which should > > behave more intelligently. In the above case it might involve turning > > the _1 and _3 defs into .VCOND [with different result type], doing > > _9 in that type and then somehow dealing with _7 ... but this eventually > > means undoing the match simplification that introduced the code? > > For targets that do not have compact boolean vectors, > VEC_COND_EXPR<*,-1,0> does nothing, it is just there as a technicality. > Removing it to allow more simplifications makes sense, reintroducing it > for expansion can make sense as well, I think it is ok if the second one > reverses the first, but very locally, without having to propagate a change > of type to the uses. So on ARM we could turn _1 and _3 into .VCOND > producing bool:32 vectors, and replace _7 with a VIEW_CONVERT_EXPR. Or > does using bool vectors like that seem bad? (maybe they aren't guaranteed > to be equivalent to signed integer vectors with values 0 and -1 and we > need to query the target to know if that is the case, or introduce an > extra .VCOND) > > Also, we only want to replace comparisons with .VCOND if the target > doesn't handle the comparison directly. For AVX512, we do want to produce > SImode bool vectors (for k* registers) and operate on them directly, > that's the whole point of introducing the bool vectors (if bool vectors > were only used to feed VEC_COND_EXPR and were all turned into .VCOND > before expansion, I don't see the point of specifying different types for > bool vectors for AVX512 vs non-AVX512, as it would make no difference on > what is passed to the backend). > > I think targets should provide some number of operations, including for > instance bit_and and bit_ior on bool vectors, and be a bit consistent > about what they provide, it becomes unmanageable in the middle-end > otherwise...
It's currently somewhat of a mess I guess. It might help to restrict the simplification patterns to passes before vector lowering so maybe add something like (or re-use) canonicalize_math[_after_vectorization]_p ()? Richard. > > -- > Marc Glisse