Hi! The following testcase ICEs, because for SSE4.1 only VEC_COND_EXPRs with EQ_EXPR/NE_EXPR are supported and vectorizer generates such VEC_COND_EXPR, but later on the condition is folded into a VECTOR_CST and the VEC_COND_EXPR expansion code expands non-comparison conditions as LT_EXPR against zero vector.
I think the only problematic case is when the equality comparison is folded into a constant; at that point, if both other VEC_COND_EXPR arguments are constant, we could in theory fold it (but can't really rely on it during expansion anyway), but if they aren't constant, just the condition is, there is nothing to fold it into anyway. The patch verifies that LT_EXPR against zero will behave the same as NE_EXPR by punting if there are non-canonical elements (> 0), otherwise just tries to expand it as NE_EXPR if LT_EXPR didn't work. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2019-09-01 Jakub Jelinek <ja...@redhat.com> PR middle-end/91623 * optabs.c (expand_vec_cond_expr): If op0 is a VECTOR_CST and only EQ_EXPR/NE_EXPR is supported, verify that op0 only contains zeros or negative elements and use NE_EXPR instead of LT_EXPR against zero vector. * gcc.target/i386/pr91623.c: New test. --- gcc/optabs.c.jj 2019-08-27 12:26:37.392912813 +0200 +++ gcc/optabs.c 2019-08-31 19:49:32.831430056 +0200 @@ -5868,6 +5868,25 @@ expand_vec_cond_expr (tree vec_cond_type icode = get_vcond_icode (mode, cmp_op_mode, unsignedp); if (icode == CODE_FOR_nothing) { + if (tcode == LT_EXPR + && op0a == op0 + && TREE_CODE (op0) == VECTOR_CST) + { + /* A VEC_COND_EXPR condition could be folded from EQ_EXPR/NE_EXPR + into a constant when only get_vcond_eq_icode is supported. + Verify < 0 and != 0 behave the same and change it to NE_EXPR. */ + unsigned HOST_WIDE_INT nelts; + if (!VECTOR_CST_NELTS (op0).is_constant (&nelts)) + { + if (VECTOR_CST_STEPPED_P (op0)) + return 0; + nelts = vector_cst_encoded_nelts (op0); + } + for (unsigned int i = 0; i < nelts; ++i) + if (tree_int_cst_sgn (vector_cst_elt (op0, i)) == 1) + return 0; + tcode = NE_EXPR; + } if (tcode == EQ_EXPR || tcode == NE_EXPR) icode = get_vcond_eq_icode (mode, cmp_op_mode); if (icode == CODE_FOR_nothing) --- gcc/testsuite/gcc.target/i386/pr91623.c.jj 2019-08-31 19:55:02.470674149 +0200 +++ gcc/testsuite/gcc.target/i386/pr91623.c 2019-08-31 19:54:39.186010098 +0200 @@ -0,0 +1,32 @@ +/* PR middle-end/91623 */ +/* { dg-do compile } */ +/* { dg-options "-O3 -msse4.1 -mno-sse4.2" } */ + +typedef long long V __attribute__((__vector_size__(16))); +V e, h; +int d; +const int i; + +void foo (void); + +void +bar (int k, int l) +{ + if (d && 0 <= k - 1 && l) + foo (); +} + +void +baz (void) +{ + V n = (V) { 1 }; + V g = (V) {}; + V o = g; + for (int f = 0; f < i; ++f) + { + V a = o == n; + h = a; + bar (f, i); + o = e; + } +} Jakub