On 23/10/2023 11:43, Richard Biener wrote:
On Fri, 20 Oct 2023, Andrew Stubbs wrote:

This patch fixes a wrong-code bug on amdgcn in which the excess "ones" in the
mask enable extra lanes that were supposed to be unused and are therefore
undefined.

Richi suggested an alternative approach involving narrower types and then a
zero-extend to the actual mask type.  This solved the problem for the specific
test case that I had, but I'm not sure if it would work with V2 and V4 modes
(not that I've observed bad behaviour from them anyway, but still).  There
were some other caveats involving "two-lane division" that I don't fully
understand, so I went with the simpler implementation.

This patch does have the disadvantage of an additional "and" instruction in
the non-constant case even for machines that don't need it. I'm not sure how
to fix that without an additional target hook. (If GCC could use the 64-lane
vectors more effectively without the assistance of artificially reduced sizes
then this problem wouldn't exist.)

OK to commit?

-           convert_move (target, op0, 0);
+           rtx tmp = gen_reg_rtx (mode);
+           convert_move (tmp, op0, 0);
+
+           if (known_ne (TYPE_VECTOR_SUBPARTS (type),
+                         GET_MODE_PRECISION (mode)))

Usually this would be maybe_ne, but then ...

+             {
+               /* Ensure no excess bits are set.
+                  GCN needs this, AVX does not.  */
+               expand_binop (mode, and_optab, tmp,
+                             GEN_INT ((1 << (TYPE_VECTOR_SUBPARTS (type)
+                                             .to_constant())) - 1),
+                             target, true, OPTAB_DIRECT);

here you have .to_constant ().  I think with having an integer mode
we know subparts is constant so I'd prefer

             auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
             if (maybe_ne (GET_MODE_PRECISION (mode), nunits)
...

+             }
+           else
+             emit_move_insn (target, tmp);

note you need the emit_move_insn also for the expand_binop
path since it's not guaranteed that 'target' is used there.  Thus

   tmp = expand_binop (...)
   if (tmp != target)
     emit_move_insn (...)

Otherwise looks good to me.  The and is needed on x86 for
two and four bit masks, it would be more efficient to use
smaller modes for the sign-extension I guess.

I think this patch addresses these issues. I've confirmed it does the right thing on amdgcn.

OK?

Andrew
vect: Don't set excess bits in unform masks

AVX ignores any excess bits in the mask (at least for vector sizes >=8), but
AMD GCN magically uses a larger vector than was intended (the smaller sizes are
"fake"), leading to wrong-code.

This patch fixes amdgcn execution failures in gcc.dg/vect/pr81740-1.c,
gfortran.dg/c-interop/contiguous-1.f90,
gfortran.dg/c-interop/ff-descriptor-7.f90, and others.

gcc/ChangeLog:

        * expr.cc (store_constructor): Add "and" operation to uniform mask
        generation.

diff --git a/gcc/expr.cc b/gcc/expr.cc
index ed4dbb13d89..3e2a678710d 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -7470,7 +7470,7 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
            break;
          }
        /* Use sign-extension for uniform boolean vectors with
-          integer modes.  */
+          integer modes.  Effectively "vec_duplicate" for bitmasks.  */
        if (!TREE_SIDE_EFFECTS (exp)
            && VECTOR_BOOLEAN_TYPE_P (type)
            && SCALAR_INT_MODE_P (mode)
@@ -7479,7 +7479,19 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
          {
            rtx op0 = force_reg (TYPE_MODE (TREE_TYPE (elt)),
                                 expand_normal (elt));
-           convert_move (target, op0, 0);
+           rtx tmp = gen_reg_rtx (mode);
+           convert_move (tmp, op0, 0);
+
+           /* Ensure no excess bits are set.
+              GCN needs this for nunits < 64.
+              x86 needs this for nunits < 8.  */
+           auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
+           if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
+             tmp = expand_binop (mode, and_optab, tmp,
+                                 GEN_INT ((1 << nunits) - 1), target,
+                                 true, OPTAB_DIRECT);
+           if (tmp != target)
+             emit_move_insn (target, tmp);
            break;
          }
 

Reply via email to