On Tue, Mar 31, 2026 at 2:59 PM Robin Dapp <[email protected]> wrote:
>
> Hi,
>
> (GCC 17)
>
> Currently, for a situation like
>
>   _1 = BIT_FIELD_REF <src1_9(D), 32, 0>;
>   _2 = BIT_FIELD_REF <src1_9(D), 32, 32>;
>   _3 = BIT_FIELD_REF <src1_9(D), 32, 64>;
>   _4 = BIT_FIELD_REF <src1_9(D), 32, 96>;
>   _5 = BIT_FIELD_REF <src2_16(D), 32, 0>;
>   _6 = BIT_FIELD_REF <src2_16(D), 32, 32>;
>   _7 = BIT_FIELD_REF <src2_16(D), 32, 64>;
>   _8 = BIT_FIELD_REF <src2_16(D), 32, 96>;
>   _21 = {_1, _2, _3, _4, _5, _6, _7, _8};
>
> we give up because the vec_init, looking through the BIT_FIELD_REFs, sees
> two source vectors but we only expect one.
>
> This patch adds support for a second one, leading to a permutation being
> created.  We can do the same optimization in forwprop but that's too
> late as we have already confused vect costing by the (usually) expensive
> vec_init.
>
> Without changes, this causes a regression on x86 (pr54400.c):
>
>   __m128d i4 (__m128d p, __m128d q)
>   {
>     __m128d r = { p[1] + p[0], q[1] + q[0] };
>     return r;
>   }
>
> The BIT_FIELD_EXPRs are expanded to vec_selects, the constructor to 
> vec_concat.
> We recognize this in combine as the appropriate insn pattern.  I took the
> shotgun approach and just added all now-necessary patterns.

I wonder if we can deal with this case during RTL expansion, basically
detect a concat permute and see if there's vec_init of vector components
supported and use that?  The alternative to new patterns would be to
make vec_perm_const of the target emit a concat?

Or did I misunderstand the issue?

> I suppose this kind of fallout is unavoidable, in particular for small vectors
> that we can "naturally" combine into others.  In such situations the
> two-vector/permute approach is at a disadvantage.  For larger vectors, though,
> the vec_select/vec_concat approach doesn't scale.
>
> Bootstrapped and regtested on x86, power10, and aarch64.
> Regtested on riscv64.
>
> Regards
>  Robin
>
>         PR tree-optimization/105816
>
> gcc/ChangeLog:
>
>         * tree-vect-slp.cc (compatible_bit_field_ref_source): New helper
>         function.
>         (vect_build_slp_tree_1): Use helper.
>         (build_perm_source_node): New helper function.
>         (vect_build_slp_tree_2): Use helper and allow a second source
>         vector.
>         * config/i386/sse.md (*sse3_haddv2df3_perm): New pattern to
>         recognize interleaved/permuted hsub/hadd.
>         (*sse3_hsubv2df3_perm): Ditto.
>         (*sse3_haddv2df3_perm_rev): Ditto.
>         (*sse3_haddv2df3_merge): Ditto.
>         (*sse3_haddv2df3_merge_rev): Ditto.
>         (*sse3_hsubv2df3_perm): Ditto.
>         (*sse3_hsubv2df3_perm_rev): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/vect/pr105816.c: New test.
>
> Signed-off-by: Robin Dapp <[email protected]>
> ---
>  gcc/config/i386/sse.md               | 141 +++++++++++++++++++++++++++
>  gcc/testsuite/gcc.dg/vect/pr105816.c |  23 +++++
>  gcc/tree-vect-slp.cc                 | 139 ++++++++++++++++++--------
>  3 files changed, 263 insertions(+), 40 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr105816.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index a3f68ad9c1a..b32ea0657de 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -3844,6 +3844,147 @@ (define_insn "*sse3_hsubv2df3_low"
>     (set_attr "prefix" "orig,vex")
>     (set_attr "mode" "V2DF")])
>
> +; The following patterns help recognize
> +;   (plus/minus interleave_even interleave_odd)
> +; as h(sub/add).
> +(define_insn_and_split "*sse3_haddv2df3_perm"
> +  [(set (match_operand:V2DF 0 "register_operand")
> +    (plus:V2DF
> +      (vec_select:V2DF
> +        (vec_concat:V4DF
> +          (match_operand:V2DF 1 "register_operand")
> +          (match_operand:V2DF 2 "vector_operand"))
> +        (parallel [(const_int 0) (const_int 2)]))
> +      (vec_select:V2DF
> +        (vec_concat:V4DF (match_dup 1) (match_dup 2))
> +        (parallel [(const_int 1) (const_int 3)]))))]
> +  "TARGET_SSE3"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +    (vec_concat:V2DF
> +      (plus:DF
> +        (vec_select:DF (match_dup 1) (parallel [(const_int 0)]))
> +        (vec_select:DF (match_dup 1) (parallel [(const_int 1)])))
> +      (plus:DF
> +        (vec_select:DF (match_dup 2) (parallel [(const_int 0)]))
> +        (vec_select:DF (match_dup 2) (parallel [(const_int 1)])))))])
> +
> +(define_insn_and_split "*sse3_haddv2df3_perm_rev"
> +  [(set (match_operand:V2DF 0 "register_operand")
> +    (plus:V2DF
> +      (vec_select:V2DF
> +        (vec_concat:V4DF
> +          (match_operand:V2DF 1 "register_operand")
> +          (match_operand:V2DF 2 "vector_operand"))
> +        (parallel [(const_int 1) (const_int 3)]))
> +      (vec_select:V2DF
> +        (vec_concat:V4DF (match_dup 1) (match_dup 2))
> +        (parallel [(const_int 0) (const_int 2)]))))]
> +  "TARGET_SSE3"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +    (vec_concat:V2DF
> +      (plus:DF
> +        (vec_select:DF (match_dup 1) (parallel [(const_int 0)]))
> +        (vec_select:DF (match_dup 1) (parallel [(const_int 1)])))
> +      (plus:DF
> +        (vec_select:DF (match_dup 2) (parallel [(const_int 0)]))
> +        (vec_select:DF (match_dup 2) (parallel [(const_int 1)])))))])
> +
> +(define_insn_and_split "*sse3_haddv2df3_merge"
> +  [(set (match_operand:V2DF 0 "register_operand")
> +    (plus:V2DF
> +      (vec_select:V2DF
> +        (vec_concat:V4DF
> +          (match_operand:V2DF 1 "register_operand")
> +          (match_operand:V2DF 2 "vector_operand"))
> +        (parallel [(const_int 1) (const_int 2)]))
> +      (vec_merge:V2DF
> +        (match_dup 1)
> +        (match_dup 2)
> +        (const_int 1))))]
> +  "TARGET_SSE3"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +    (vec_concat:V2DF
> +      (plus:DF
> +        (vec_select:DF (match_dup 1) (parallel [(const_int 0)]))
> +        (vec_select:DF (match_dup 1) (parallel [(const_int 1)])))
> +      (plus:DF
> +        (vec_select:DF (match_dup 2) (parallel [(const_int 0)]))
> +        (vec_select:DF (match_dup 2) (parallel [(const_int 1)])))))])
> +
> +(define_insn_and_split "*sse3_haddv2df3_merge_rev"
> +  [(set (match_operand:V2DF 0 "register_operand")
> +    (plus:V2DF
> +      (vec_merge:V2DF
> +        (match_operand:V2DF 1 "register_operand")
> +        (match_operand:V2DF 2 "vector_operand")
> +        (const_int 1))
> +      (vec_select:V2DF
> +        (vec_concat:V4DF (match_dup 1) (match_dup 2))
> +        (parallel [(const_int 1) (const_int 2)]))))]
> +  "TARGET_SSE3"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +    (vec_concat:V2DF
> +      (plus:DF
> +        (vec_select:DF (match_dup 1) (parallel [(const_int 0)]))
> +        (vec_select:DF (match_dup 1) (parallel [(const_int 1)])))
> +      (plus:DF
> +        (vec_select:DF (match_dup 2) (parallel [(const_int 0)]))
> +        (vec_select:DF (match_dup 2) (parallel [(const_int 1)])))))])
> +
> +(define_insn_and_split "*sse3_hsubv2df3_perm"
> +  [(set (match_operand:V2DF 0 "register_operand")
> +    (minus:V2DF
> +      (vec_select:V2DF
> +        (vec_concat:V4DF
> +          (match_operand:V2DF 1 "register_operand")
> +          (match_operand:V2DF 2 "vector_operand"))
> +        (parallel [(const_int 0) (const_int 2)]))
> +      (vec_select:V2DF
> +        (vec_concat:V4DF (match_dup 1) (match_dup 2))
> +        (parallel [(const_int 1) (const_int 3)]))))]
> +  "TARGET_SSE3"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +    (vec_concat:V2DF
> +      (minus:DF
> +        (vec_select:DF (match_dup 1) (parallel [(const_int 0)]))
> +        (vec_select:DF (match_dup 1) (parallel [(const_int 1)])))
> +      (minus:DF
> +        (vec_select:DF (match_dup 2) (parallel [(const_int 0)]))
> +        (vec_select:DF (match_dup 2) (parallel [(const_int 1)])))))])
> +
> +(define_insn_and_split "*sse3_hsubv2df3_perm_rev"
> +  [(set (match_operand:V2DF 0 "register_operand")
> +    (minus:V2DF
> +      (vec_select:V2DF
> +        (vec_concat:V4DF
> +          (match_operand:V2DF 1 "register_operand")
> +          (match_operand:V2DF 2 "vector_operand"))
> +        (parallel [(const_int 1) (const_int 3)]))
> +      (vec_select:V2DF
> +        (vec_concat:V4DF (match_dup 1) (match_dup 2))
> +        (parallel [(const_int 0) (const_int 2)]))))]
> +  "TARGET_SSE3"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +    (vec_concat:V2DF
> +      (minus:DF
> +        (vec_select:DF (match_dup 1) (parallel [(const_int 0)]))
> +        (vec_select:DF (match_dup 1) (parallel [(const_int 1)])))
> +      (minus:DF
> +        (vec_select:DF (match_dup 2) (parallel [(const_int 0)]))
> +        (vec_select:DF (match_dup 2) (parallel [(const_int 1)])))))])
> +
>  (define_insn "avx_h<insn>v8sf3"
>    [(set (match_operand:V8SF 0 "register_operand" "=x")
>         (vec_concat:V8SF
> diff --git a/gcc/testsuite/gcc.dg/vect/pr105816.c 
> b/gcc/testsuite/gcc.dg/vect/pr105816.c
> new file mode 100644
> index 00000000000..9f9a24aea4e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr105816.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-fdump-tree-slp-details" } */
> +
> +void test_lo (short * __restrict dst, short *src1, short *src2, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      dst[0] = src1[0];
> +      dst[1] = src1[1];
> +      dst[2] = src1[2];
> +      dst[3] = src1[3];
> +      dst[4] = src2[0];
> +      dst[5] = src2[1];
> +      dst[6] = src2[2];
> +      dst[7] = src2[3];
> +      dst+=8;
> +      src1+=4;
> +      src2+=4;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "different BIT_FIELD_REF arguments" 
> "slp1" } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 8fa6a740c96..e71118bb56f 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -1088,6 +1088,23 @@ vect_record_max_nunits (vec_info *vinfo, stmt_vec_info 
> stmt_info,
>    return true;
>  }
>
> +/* Return true if VEC is an SSA name, of compatible vector type and its
> +   type of the same size as VECTYPE.  */
> +
> +static bool
> +compatible_bit_field_ref_source (vec_info *vinfo, tree vec, tree vectype)
> +{
> +  return is_a <bb_vec_info> (vinfo)

I'd like to see the bb_vec_info check ...

> +    && TREE_CODE (vec) == SSA_NAME
> +    /* When the element types are not compatible we pun the
> +       source to the target vectype which requires equal size.  */
> +    && ((VECTOR_TYPE_P (TREE_TYPE (vec))
> +        && types_compatible_p (TREE_TYPE (vectype),
> +                               TREE_TYPE (TREE_TYPE (vec))))
> +       || operand_equal_p (TYPE_SIZE (vectype),
> +                           TYPE_SIZE (TREE_TYPE (vec))));
> +}
> +
>  /* Verify if the scalar stmts STMTS are isomorphic, require data
>     permutation or are of unsupported types of operation.  Return
>     true if they are, otherwise return false and indicate in *MATCHES
> @@ -1122,6 +1139,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> *swap,
>    int first_reduc_idx = -1;
>    bool maybe_soft_fail = false;
>    tree soft_fail_nunits_vectype = NULL_TREE;
> +  tree other_bfref_source = NULL_TREE;
>
>    tree vectype, nunits_vectype;
>    if (!vect_get_vector_types_for_stmt (vinfo, first_stmt_info, &vectype,
> @@ -1316,15 +1334,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> *swap,
>                    && rhs_code == BIT_FIELD_REF)
>             {
>               tree vec = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
> -             if (!is_a <bb_vec_info> (vinfo)
> -                 || TREE_CODE (vec) != SSA_NAME
> -                 /* When the element types are not compatible we pun the
> -                    source to the target vectype which requires equal size.  
> */
> -                 || ((!VECTOR_TYPE_P (TREE_TYPE (vec))
> -                      || !types_compatible_p (TREE_TYPE (vectype),
> -                                              TREE_TYPE (TREE_TYPE (vec))))
> -                     && !operand_equal_p (TYPE_SIZE (vectype),
> -                                          TYPE_SIZE (TREE_TYPE (vec)))))

... here, because it isn't really relevant to compatibility and it helps seeing
what is special to BB vect here.

Otherwise looks good to me, but I cannot approve the x86 parts (and as said,
I'm not sure it is the appropriate solution).

Thanks,
Richard.

> +             if (!compatible_bit_field_ref_source (vinfo, vec, vectype))
>                 {
>                   if (dump_enabled_p ())
>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -1422,17 +1432,40 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> *swap,
>               continue;
>             }
>
> +         /* Check if the number of distinct BIT_FIELD_REF source operands
> +            is at most 2.  */
>           if (!ldst_p
>               && first_stmt_code == BIT_FIELD_REF
>               && (TREE_OPERAND (gimple_assign_rhs1 (first_stmt_info->stmt), 0)
>                   != TREE_OPERAND (gimple_assign_rhs1 (stmt_info->stmt), 0)))
>             {
> -             if (dump_enabled_p ())
> -               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                                "Build SLP failed: different BIT_FIELD_REF "
> -                                "arguments in %G", stmt);
> -             /* Mismatch.  */
> -             continue;
> +             tree cur = TREE_OPERAND (gimple_assign_rhs1 (stmt_info->stmt), 
> 0);
> +             if (!other_bfref_source)
> +               {
> +                 if (!compatible_bit_field_ref_source (vinfo, cur, vectype))
> +                   {
> +                     if (dump_enabled_p ())
> +                       dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> +                                        vect_location,
> +                                        "Build SLP failed: different "
> +                                        "BIT_FIELD_REF arguments in "
> +                                        "%G", stmt);
> +                     /* Mismatch.  */
> +                     continue;
> +                   }
> +                 other_bfref_source = cur;
> +               }
> +             else if (cur != other_bfref_source)
> +               {
> +                 if (dump_enabled_p ())
> +                   dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> +                                    vect_location,
> +                                    "Build SLP failed: different "
> +                                    "BIT_FIELD_REF arguments in "
> +                                    "%G", stmt);
> +                 /* Mismatch.  */
> +                 continue;
> +               }
>             }
>
>           if (call_stmt
> @@ -1962,6 +1995,40 @@ vect_slp_build_two_operator_nodes (slp_tree perm, tree 
> vectype,
>    SLP_TREE_CHILDREN (perm).quick_push (child2);
>  }
>
> +/* Helper for creating a permutation source node when converting a
> +    vec[0] = BIT_FIELD_REF (vec, ...)
> +   into a vector permutation.
> +
> +   Initializes an SLP_TREE with the type of VECTYPE, sets its number of 
> units,
> +   and adds VEC to its VEC_DEFs, returning the new node.  */
> +
> +static slp_tree
> +build_perm_source_node (tree vec, tree vectype)
> +{
> +  slp_tree vnode = vect_create_new_slp_node (vNULL);
> +  if (operand_equal_p (TYPE_SIZE (vectype), TYPE_SIZE (TREE_TYPE (vec))))
> +    /* ???  We record vectype here but we hide eventually necessary
> +       punning and instead rely on code generation to materialize
> +       VIEW_CONVERT_EXPRs as necessary.  We instead should make
> +       this explicit somehow.  */
> +    SLP_TREE_VECTYPE (vnode) = vectype;
> +  else
> +    {
> +      /* For different size but compatible elements we can still
> +        use VEC_PERM_EXPR without punning.  */
> +      gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec))
> +                 && types_compatible_p (TREE_TYPE (vectype),
> +                                        TREE_TYPE (TREE_TYPE (vec))));
> +      SLP_TREE_VECTYPE (vnode) = TREE_TYPE (vec);
> +    }
> +  auto nunits = TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (vnode));
> +  unsigned HOST_WIDE_INT const_nunits;
> +  if (nunits.is_constant (&const_nunits))
> +    SLP_TREE_LANES (vnode) = const_nunits;
> +  SLP_TREE_VEC_DEFS (vnode).safe_push (vec);
> +  return vnode;
> +}
> +
>  /* Recursively build an SLP tree starting from NODE.
>     Fail (and return a value not equal to zero) if def-stmts are not
>     isomorphic, require data permutation or are of unsupported types of
> @@ -2195,15 +2262,19 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>            && !gimple_vuse (stmt_info->stmt)
>            && gimple_assign_rhs_code (stmt_info->stmt) == BIT_FIELD_REF)
>      {
> -      /* vect_build_slp_tree_2 determined all BIT_FIELD_REFs reference
> -        the same SSA name vector of a compatible type to vectype.  */
> +      /* vect_build_slp_tree_1 determined all BIT_FIELD_REFs reference
> +        at most two SSA name vectors of a compatible type to vectype.  */
>        vec<std::pair<unsigned, unsigned> > lperm = vNULL;
>        tree vec = TREE_OPERAND (gimple_assign_rhs1 (stmt_info->stmt), 0);
> +      tree other_vec = NULL_TREE;
>        stmt_vec_info estmt_info;
>        FOR_EACH_VEC_ELT (stmts, i, estmt_info)
>         {
>           gassign *estmt = as_a <gassign *> (estmt_info->stmt);
>           tree bfref = gimple_assign_rhs1 (estmt);
> +         tree cur_vec = TREE_OPERAND (bfref, 0);
> +         if (cur_vec != vec)
> +           other_vec = cur_vec;
>           HOST_WIDE_INT lane;
>           if (!known_eq (bit_field_size (bfref),
>                          tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE 
> (vectype))))
> @@ -2214,39 +2285,27 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>               matches[0] = false;
>               return NULL;
>             }
> -         lperm.safe_push (std::make_pair (0, (unsigned)lane));
> +         lperm.safe_push (std::make_pair (cur_vec == vec ? 0 : 1,
> +                                          (unsigned) lane));
>         }
> -      slp_tree vnode = vect_create_new_slp_node (vNULL);
> -      if (operand_equal_p (TYPE_SIZE (vectype), TYPE_SIZE (TREE_TYPE (vec))))
> -       /* ???  We record vectype here but we hide eventually necessary
> -          punning and instead rely on code generation to materialize
> -          VIEW_CONVERT_EXPRs as necessary.  We instead should make
> -          this explicit somehow.  */
> -       SLP_TREE_VECTYPE (vnode) = vectype;
> -      else
> -       {
> -         /* For different size but compatible elements we can still
> -            use VEC_PERM_EXPR without punning.  */
> -         gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec))
> -                     && types_compatible_p (TREE_TYPE (vectype),
> -                                            TREE_TYPE (TREE_TYPE (vec))));
> -         SLP_TREE_VECTYPE (vnode) = TREE_TYPE (vec);
> -       }
> -      auto nunits = TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (vnode));
> -      unsigned HOST_WIDE_INT const_nunits;
> -      if (nunits.is_constant (&const_nunits))
> -       SLP_TREE_LANES (vnode) = const_nunits;
> -      SLP_TREE_VEC_DEFS (vnode).safe_push (vec);
> +      slp_tree vnode = build_perm_source_node (vec, vectype);
> +      slp_tree vnode2 = NULL;
> +      if (other_vec)
> +       vnode2 = build_perm_source_node (other_vec, vectype);
> +
>        /* We are always building a permutation node even if it is an identity
>          permute to shield the rest of the vectorizer from the odd node
>          representing an actual vector without any scalar ops.
>          ???  We could hide it completely with making the permute node
>          external?  */
> -      node = vect_create_new_slp_node (node, stmts, 1);
> +      node = vect_create_new_slp_node (node, stmts, !other_vec ? 1 : 2);
>        SLP_TREE_CODE (node) = VEC_PERM_EXPR;
>        SLP_TREE_LANE_PERMUTATION (node) = lperm;
>        SLP_TREE_VECTYPE (node) = vectype;
>        SLP_TREE_CHILDREN (node).quick_push (vnode);
> +      if (other_vec)
> +       SLP_TREE_CHILDREN (node).quick_push (vnode2);
> +
>        return node;
>      }
>    /* When discovery reaches an associatable operation see whether we can
> --
> 2.53.0

Reply via email to