On Thu, 6 Aug 2020, Richard Biener wrote:

On Thu, Aug 6, 2020 at 10:17 AM Christophe Lyon
<christophe.l...@linaro.org> wrote:

Hi,


On Wed, 5 Aug 2020 at 16:24, Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:

On Wed, Aug 5, 2020 at 3:33 PM Marc Glisse <marc.gli...@inria.fr> wrote:

New version that passed bootstrap+regtest during the night.

When vector comparisons were forced to use vec_cond_expr, we lost a number of
optimizations (my fault for not adding enough testcases to prevent that).
This patch tries to unwrap vec_cond_expr a bit so some optimizations can
still happen.

I wasn't planning to add all those transformations together, but adding one
caused a regression, whose fix introduced a second regression, etc.

Restricting to constant folding would not be sufficient, we also need at
least things like X|0 or X&X. The transformations are quite conservative
with :s and folding only if everything simplifies, we may want to relax
this later. And of course we are going to miss things like a?b:c + a?c:b
-> b+c.

In terms of number of operations, some transformations turning 2
VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not look
like a gain... I expect the bit_not disappears in most cases, and
VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.

I am a bit confused that with avx512 we get types like "vector(4)
<signed-boolean:2>" with :2 and not :1 (is it a hack so true is 1 and not
-1?), but that doesn't matter for this patch.

OK.

Thanks,
Richard.

2020-08-05  Marc Glisse  <marc.gli...@inria.fr>

        PR tree-optimization/95906
        PR target/70314
        * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
        (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
        (op (c ? a : b)): Update to match the new transformations.

        * gcc.dg/tree-ssa/andnot-2.c: New file.
        * gcc.dg/tree-ssa/pr95906.c: Likewise.
        * gcc.target/i386/pr70314.c: Likewise.


I think this patch is causing several ICEs on arm-none-linux-gnueabihf
--with-cpu cortex-a9 --with-fpu neon-fp16:
  Executed from: gcc.c-torture/compile/compile.exp
    gcc.c-torture/compile/20160205-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error)
    gcc.c-torture/compile/20160205-1.c   -O3 -g  (internal compiler error)
  Executed from: gcc.dg/dg.exp
    gcc.dg/pr87746.c (internal compiler error)
  Executed from: gcc.dg/tree-ssa/tree-ssa.exp
    gcc.dg/tree-ssa/ifc-cd.c (internal compiler error)
  Executed from: gcc.dg/vect/vect.exp
    gcc.dg/vect/pr59591-1.c (internal compiler error)
    gcc.dg/vect/pr59591-1.c -flto -ffat-lto-objects (internal compiler error)
    gcc.dg/vect/pr86927.c (internal compiler error)
    gcc.dg/vect/pr86927.c -flto -ffat-lto-objects (internal compiler error)
    gcc.dg/vect/slp-cond-5.c (internal compiler error)
    gcc.dg/vect/slp-cond-5.c -flto -ffat-lto-objects (internal compiler error)
    gcc.dg/vect/vect-23.c (internal compiler error)
    gcc.dg/vect/vect-23.c -flto -ffat-lto-objects (internal compiler error)
    gcc.dg/vect/vect-24.c (internal compiler error)
    gcc.dg/vect/vect-24.c -flto -ffat-lto-objects (internal compiler error)
    gcc.dg/vect/vect-cond-reduc-6.c (internal compiler error)
    gcc.dg/vect/vect-cond-reduc-6.c -flto -ffat-lto-objects (internal
compiler error)

Backtrace for gcc.c-torture/compile/20160205-1.c   -O3
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions
during RTL pass: expand
/gcc/testsuite/gcc.c-torture/compile/20160205-1.c:2:5: internal
compiler error: in do_store_flag, at expr.c:12259
0x8feb26 do_store_flag
        /gcc/expr.c:12259
0x900201 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
        /gcc/expr.c:9617
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
        /gcc/expr.c:10159
0x91174e expand_expr
        /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
        /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
        /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
        /gcc/expr.c:10159
0x91174e expand_expr
        /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
        /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
        /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
        /gcc/expr.c:10159
0x91174e expand_expr
        /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
        /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
        /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
        /gcc/expr.c:10159
0x91174e expand_expr
        /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
        /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
        /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
        /gcc/expr.c:10159
0x91174e expand_expr
        /gcc/expr.h:282

Hmm, I guess we might need to verify that the VEC_COND_EXPRs
can be RTL expanded, at least if the folding triggers after vector
lowering (but needing to lower a previously expandable VEC_COND_EXPR
would be similarly bad).  So we may need to handle VEC_COND_EXPRs
like VEC_PERMs and thus need to check target support.  Ick.

Maybe. I'd like to see what the gimple looks like that arm fails to expand, if that's really a limitation in the hardware, or just some simple missing case in the target or the expansion code. Is it that we had (a<b)?-1:0 which arm can handle, and because of the transformation we have to expand a plain c=a<b and arm cannot handle that?

If someone can confirm the breakage, please feel free to revert that patch, but also please give some details about how this breaks or provide a simple way to reproduce.

--
Marc Glisse

Reply via email to