https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103948
Bug ID: 103948
Summary: Vectorizer does not use vec_cmpMN without vcondMN
pattern
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
I was trying to add v2qi vec_cmpv2qiv2qi pattern to x86:
(define_expand "vec_cmpv2qiv2qi"
[(set (match_operand:V2QI 0 "register_operand")
(match_operator:V2QI 1 ""
[(match_operand:V2QI 2 "register_operand")
(match_operand:V2QI 3 "register_operand")]))]
"TARGET_SSE2"
{
bool ok = ix86_expand_int_vec_cmp (operands);
gcc_assert (ok);
DONE;
})
but the vectorizer does not consider the above pattern *unless* vcondv2qiv2qi
is also present:
(define_expand "vcondv2qiv2qi"
[(set (match_operand:V2QI 0 "register_operand")
(if_then_else:V2QI
(match_operator 3 ""
[(match_operand:V2QI 4 "register_operand")
(match_operand:V2QI 5 "register_operand")])
(match_operand:V2QI 1)
(match_operand:V2QI 2)))]
"TARGET_SSE4_1")
As shown above, the pattern does not need to expand to anything, just needs to
be present.
So the following testcase:
--cut here--
typedef signed char vec __attribute__((vector_size(2)));
vec lt (vec a, vec b) { return a < b; }
--cut here--
vectorizes with -msse4 and fails to vectorize with -msse2.
Looking a bit into tree-vect-generic.c, in expand_vector_comparison we do:
/* Try to expand vector comparison expression OP0 CODE OP1 by
querying optab if the following expression:
VEC_COND_EXPR< OP0 CODE OP1, {-1,...}, {0,...}>
can be expanded. */
but apparenlty only via vcondMN optab.
According to the documentation, vec_cmpMN does exactly the above:
'vec_cmpMN'
Output a vector comparison. Operand 0 of mode N is the destination
for predicate in operand 1 which is a signed vector comparison with
operands of mode M in operands 2 and 3. Predicate is computed by
element-wise evaluation of the vector comparison with a truth value
of all-ones and a false value of all-zeros.
so, support should query vec_cmpMN optab (and vec_vmpeqMN) in addition to
vcondMN optab.
I'll attach the complete patch to illustrate the issue on x86_64.