Re: VEC_COND_EXPR optimizations v2

Marc Glisse Thu, 06 Aug 2020 04:13:07 -0700

On Thu, 6 Aug 2020, Richard Biener wrote:

On Thu, Aug 6, 2020 at 10:17 AM Christophe Lyon
<[email protected]> wrote:


Hi,


On Wed, 5 Aug 2020 at 16:24, Richard Biener via Gcc-patches
<[email protected]> wrote:


On Wed, Aug 5, 2020 at 3:33 PM Marc Glisse <[email protected]> wrote:


New version that passed bootstrap+regtest during the night.

When vector comparisons were forced to use vec_cond_expr, we lost a number of
optimizations (my fault for not adding enough testcases to prevent that).
This patch tries to unwrap vec_cond_expr a bit so some optimizations can
still happen.

I wasn't planning to add all those transformations together, but adding one
caused a regression, whose fix introduced a second regression, etc.

Restricting to constant folding would not be sufficient, we also need at
least things like X|0 or X&X. The transformations are quite conservative
with :s and folding only if everything simplifies, we may want to relax
this later. And of course we are going to miss things like a?b:c + a?c:b
-> b+c.

In terms of number of operations, some transformations turning 2
VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not look
like a gain... I expect the bit_not disappears in most cases, and
VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.

I am a bit confused that with avx512 we get types like "vector(4)
<signed-boolean:2>" with :2 and not :1 (is it a hack so true is 1 and not
-1?), but that doesn't matter for this patch.


OK.

Thanks,
Richard.

2020-08-05  Marc Glisse  <[email protected]>

        PR tree-optimization/95906
        PR target/70314
        * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
        (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
        (op (c ? a : b)): Update to match the new transformations.

        * gcc.dg/tree-ssa/andnot-2.c: New file.
        * gcc.dg/tree-ssa/pr95906.c: Likewise.
        * gcc.target/i386/pr70314.c: Likewise.


I think this patch is causing several ICEs on arm-none-linux-gnueabihf
--with-cpu cortex-a9 --with-fpu neon-fp16:
  Executed from: gcc.c-torture/compile/compile.exp
    gcc.c-torture/compile/20160205-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error)
    gcc.c-torture/compile/20160205-1.c   -O3 -g  (internal compiler error)
  Executed from: gcc.dg/dg.exp
    gcc.dg/pr87746.c (internal compiler error)
  Executed from: gcc.dg/tree-ssa/tree-ssa.exp
    gcc.dg/tree-ssa/ifc-cd.c (internal compiler error)
  Executed from: gcc.dg/vect/vect.exp
    gcc.dg/vect/pr59591-1.c (internal compiler error)
    gcc.dg/vect/pr59591-1.c -flto -ffat-lto-objects (internal compiler error)
    gcc.dg/vect/pr86927.c (internal compiler error)
    gcc.dg/vect/pr86927.c -flto -ffat-lto-objects (internal compiler error)
    gcc.dg/vect/slp-cond-5.c (internal compiler error)
    gcc.dg/vect/slp-cond-5.c -flto -ffat-lto-objects (internal compiler error)
    gcc.dg/vect/vect-23.c (internal compiler error)
    gcc.dg/vect/vect-23.c -flto -ffat-lto-objects (internal compiler error)
    gcc.dg/vect/vect-24.c (internal compiler error)
    gcc.dg/vect/vect-24.c -flto -ffat-lto-objects (internal compiler error)
    gcc.dg/vect/vect-cond-reduc-6.c (internal compiler error)
    gcc.dg/vect/vect-cond-reduc-6.c -flto -ffat-lto-objects (internal
compiler error)

Backtrace for gcc.c-torture/compile/20160205-1.c   -O3
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions
during RTL pass: expand
/gcc/testsuite/gcc.c-torture/compile/20160205-1.c:2:5: internal
compiler error: in do_store_flag, at expr.c:12259
0x8feb26 do_store_flag
        /gcc/expr.c:12259
0x900201 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
        /gcc/expr.c:9617
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
        /gcc/expr.c:10159
0x91174e expand_expr
        /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
        /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
        /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
        /gcc/expr.c:10159
0x91174e expand_expr
        /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
        /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
        /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
        /gcc/expr.c:10159
0x91174e expand_expr
        /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
        /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
        /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
        /gcc/expr.c:10159
0x91174e expand_expr
        /gcc/expr.h:282
0x91174e expand_operands(tree_node*, tree_node*, rtx_def*, rtx_def**,
rtx_def**, expand_modifier)
        /gcc/expr.c:8065
0x8ff543 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
        /gcc/expr.c:9950
0x908cd0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
        /gcc/expr.c:10159
0x91174e expand_expr
        /gcc/expr.h:282


Hmm, I guess we might need to verify that the VEC_COND_EXPRs
can be RTL expanded, at least if the folding triggers after vector
lowering (but needing to lower a previously expandable VEC_COND_EXPR
would be similarly bad).  So we may need to handle VEC_COND_EXPRs
like VEC_PERMs and thus need to check target support.  Ick.

Maybe. I'd like to see what the gimple looks like that arm fails toexpand, if that's really a limitation in the hardware, or just some simplemissing case in the target or the expansion code. Is it that we had(a<b)?-1:0 which arm can handle, and because of the transformation we haveto expand a plain c=a<b and arm cannot handle that?

If someone can confirm the breakage, please feel free to revert thatpatch, but also please give some details about how this breaks or providea simple way to reproduce.


--
Marc Glisse

Re: VEC_COND_EXPR optimizations v2

Reply via email to