Re: [PATCH v2 1/3]middle-end: support new {cond{_len}_}vec_cbranch_{any|all} optabs [PR118974]

Hongtao Liu Mon, 17 Nov 2025 22:51:48 -0800

On Tue, Nov 18, 2025 at 7:14 AM Tamar Christina <[email protected]> wrote:
>
> This patch introduces six new vector cbranch optabs
>
> 1. vec_cbranch_any and vec_cbranch_all.
> 2. cond_vec_cbranch_any and cond_vec_cbranch_all.
> 3. cond_len_vec_cbranch_any and cond_len_vec_cbranch_all.
>
> Today cbranch can be used for both vector and scalar modes.  In both these
> cases it's intended to compare boolean values, either scalar or vector.
>
> The optab documentation does not however state that it can only handle
> comparisons against 0.  So many targets have added code for the vector variant
> that tries to deal with the case where we branch based on two non-zero
> registers.
>
> However this code can't ever be reached because the cbranch expansion only 
> deals
> with comparisons against 0 for vectors.  This is because for vectors the rest 
> of
> the compiler has no way to generate a non-zero comparison. e.g. the vectorizer
> will always generate a zero comparison, and the C/C++ front-ends won't allow
> vectors to be used in a cbranch as it expects a boolean value.  ISAs like SVE
> work around this by requiring you to use an SVE PTEST intrinsics which results
> in a single scalar boolean value that represents the flag values.
>
> e.g. if (svptest_any (..))
>
> The natural question is why do we not at expand time then rewrite the 
> comparison
> to a non-zero comparison if the target supports it.
>
> The reason is we can't safely do so.  For an ANY comparison (e.g. != b) this 
> is
> trivial, but for an ALL comparison (e.g. == b) we would have to flip both 
> branch
> and invert the value being compared.  i.e. we have to make it a != b 
> comparison.
>
> But in emit_cmp_and_jump_insns we can't flip the branches anymore because they
> have already been lowered into a fall through branch (PC) and a label, ready 
> for
> use in an if_then_else RTL expression.
>
> Now why does any of this matter?  Well there are three optimizations we want 
> to be
> able to do.
>
> 1. Adv. SIMD does not support a vector !=, as in there's no instruction for 
> it.
>    For both Integer and FP vectors we perform the comparisons as EQ and then
>    invert the resulting mask.  Ideally we'd like to replace this with just a 
> XOR
>    and the appropriate branch.
>
> 2. When on an SVE enabled system we would like to use an SVE compare + branch
>    for the Adv. SIMD sequence which could happen due to cost modelling.  
> However
>    we can only do so based on if we know that the values being compared 
> against
>    are the boolean masks.  This means we can't really use combine to do this
>    because combine would have to match the entire sequence including the
>    vector comparisons because at RTL we've lost the information that
>    VECTOR_BOOLEAN_P would have given us.  This sequence would be too long for
>    combine to match due to it having to match the compare + branch sequence
>    being generated as well.  It also becomes a bit messy to match ANY and ALL
>    sequences.
>
> 3. For SVE systems we would like to avoid generating the PTEST operation
>    whenever possible.  Because SVE vector integer comparisons already set 
> flags
>    we don't need the PTEST on an any or all check.  Eliminating this in RTL is
>    difficult, so the best approach is to not generate the PTEST at all when 
> not
>    needed.
>
> To handle these three cases the new optabs are added and the current cbranch 
> is
> no longer required if the target does not need help in distinguishing between
> boolean vector vs data vector operands.
>
> This difference is not important for correctness, but it is for optimization.
> So I've chosen not to deprecate the cbranch_optab but make it completely 
> optional.
>
> I'll try to explain why:
>
> An example is when unrolling is done on Adv. SIMD early break loops.
>
> We generate
>
>   vect__1.8_29 = MEM <vector(4) int> [(int *)_25];
>   vect__1.9_31 = MEM <vector(4) int> [(int *)_25 + 16B];
>   mask_patt_10.10_32 = vect__1.8_29 == { 124, 124, 124, 124 };
>   mask_patt_10.10_33 = vect__1.9_31 == { 124, 124, 124, 124 };
>   vexit_reduc_34 = .VEC_TRUNC_ADD_HIGH (mask_patt_10.10_33, 
> mask_patt_10.10_32);
>   if (vexit_reduc_34 != { 0, 0, 0, 0 })
>     goto <bb 4>; [5.50%]
>   else
>     goto <bb 18>; [94.50%]
>
> And so the new optabs aren't immediately useful because the comparisons can't
> be done by the optab itself.
>
> As such vec_cbranch_any would be called with vexit_reduc_34 and { 0, 0, 0, 0 }
> however since this expects to perform the comparison itself we end up with
>
>         ldp     q30, q31, [x0], 32
>         cmeq    v30.4s, v30.4s, v27.4s
>         cmeq    v31.4s, v31.4s, v27.4s
>         addhn   v31.4h, v31.4s, v30.4s
>         cmtst   v31.4h, v31.4h, v31.4h
>         fmov    x3, d31
>         cbz     x3, .L2
>
> instead of
>
>         ldp     q30, q31, [x0], 32
>         cmeq    v30.4s, v30.4s, v27.4s
>         cmeq    v31.4s, v31.4s, v27.4s
>         addhn   v31.4h, v31.4s, v30.4s
>         fmov    x3, d31
>         cbz     x3, .L2
>
> because we don't know that the value is already a boolean -1/0 value.  Without
> this we can't safely not perform the compare.
>
> The conversion is needed because e.g. it's not valid to drop the compare with
> zero when the vector just contains data:
>
> v30.8h = [ 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008 ]
> cmeq   v31.8h, v30.8h, #0        // -> v31.8h = [0,0,0,0,0,0,0,0]
> umaxp  v31.4s, v31.4s, v31.4s    // pairwise-OR over 0/FFFF masks -> still 
> [0,0,0,0]
> fmov   x7, d31                   // x7 = 0
> cbnz   x7, .L6                   // NOT taken (correct: there were no zeros)
>
> vs
>
> umaxp v31.4s, v31.4s, v31.4s     // pairwise unsigned max:
>                                  //   [ max(0x00020001,0x00040003)=0x00040003,
>                                  //     
> max(0x00060005,0x00080007)=0x00080007, ... ]
> fmov  x7, d31                    // x7 = 0x0008000700040003  (non-zero)
> cbnz  x7, .L66                   // TAKEN
>
> As such, to avoid the extra compare on boolean vectors, we still need the
> cbranch_optab or the new vec_cbranch_* optabs need an extre operand to 
> indicate
> what kind of data they hold.  Note that this isn't an issue for SVE because
> SVE has BImode for booleans.
>
> With these two optabs it's trivial to implement all the optimizations I
> described above.
>
> I.e. with them we can now generate
>
> .L2:
>         ldr     q31, [x1, x2]
>         add     v29.4s, v29.4s, v25.4s
>         add     v28.4s, v28.4s, v26.4s
>         add     v31.4s, v31.4s, v30.4s
>         str     q31, [x1, x2]
>         add     x1, x1, 16
>         cmp     x1, 2560
>         beq     .L1
> .L6:
>         ldr     q30, [x3, x1]
>         cmpeq   p15.s, p7/z, z30.s, z27.s
>         b.none  .L2
>
> and easily prove it correct.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>         PR target/118974
>         * optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab,
>         cond_vec_cbranch_any_optab, cond_vec_cbranch_all_optab,
>         cond_len_vec_cbranch_any_optab, cond_len_vec_cbranch_all_optab): New.
>         * doc/md.texi: Document them.
>         * optabs.cc (prepare_cmp_insn): Refactor to take optab to check for
>         instead of hardcoded cbranch and support mask and len.
>         (emit_cmp_and_jump_insn_1, emit_cmp_and_jump_insns): Implement them.
>         (emit_conditional_move, emit_conditional_add, gen_cond_trap): Update
>         after changing function signatures to support new optabs.
>
> ---
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> ae5d709bd47945272e6f45f83840e21c68bb6534..e668048a387e146b072d414168c5ed6db3707609
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -7664,8 +7664,65 @@ position of Operand 1 to test.  Operand 3 is the 
> @code{code_label} to jump to.
>  Conditional branch instruction combined with a compare instruction.
>  Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
>  first and second operands of the comparison, respectively.  Operand 3
> +is the @code{code_label} to jump to.  This optab is only used for 
> comparisons of
> +VECTOR_BOOLEAN_TYPE_P values and it never called for data-registers.  Data
> +vector operands should use one of the patterns below instead.
> +
> +@cindex @code{vec_cbranch_any@var{mode}} instruction pattern
> +@item @samp{vec_cbranch_any@var{mode}}
> +Conditional branch instruction based on a vector compare that branches
> +when at least one of the elementwise comparisons of the two input
> +vectors is true.
> +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> +first and second operands of the comparison, respectively.  Operand 3
>  is the @code{code_label} to jump to.
>
> +@cindex @code{vec_cbranch_all@var{mode}} instruction pattern
> +@item @samp{vec_cbranch_all@var{mode}}
> +Conditional branch instruction based on a vector compare that branches
> +when all of the elementwise comparisons of the two input vectors is true.
> +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> +first and second operands of the comparison, respectively.  Operand 3
> +is the @code{code_label} to jump to.
> +
> +@cindex @code{cond_vec_cbranch_any@var{mode}} instruction pattern
> +@item @samp{cond_vec_cbranch_any@var{mode}}
> +Masked conditional branch instruction based on a vector compare that branches
> +when at least one of the elementwise comparisons of the two input
> +vectors is true.
> +Operand 0 is a comparison operator.  Operand 1 is the mask operand.
> +Operand 2 and operand 3 are the first and second operands of the comparison,
> +respectively.  Operand 4 is the else value for the masked operation.
> +Operand 5 is the @code{code_label} to jump to.


Hello, I'd like to confirm that operand[4] is also a mask with the
same mode as operands[1], and each corresponding bit represents its
else value?
I notice the aarch64 backend patch defines operands[4] as
aarch64_simd_imm_zero, but it can be any mask, right?

> +
> +@cindex @code{cond_vec_cbranch_all@var{mode}} instruction pattern
> +@item @samp{cond_vec_cbranch_all@var{mode}}
> +Masked conditional branch instruction based on a vector compare that branches
> +when all of the elementwise comparisons of the two input vectors is true.
> +Operand 0 is a comparison operator.  Operand 1 is the mask operand.
> +Operand 2 and operand 3 are the first and second operands of the comparison,
> +respectively.  Operand 4 is the else value for the masked operation.
> +Operand 5 is the @code{code_label} to jump to.
> +
> +@cindex @code{cond_len_vec_cbranch_any@var{mode}} instruction pattern
> +@item @samp{cond_len_vec_cbranch_any@var{mode}}
> +Len based conditional branch instruction based on a vector compare that 
> branches
> +when at least one of the elementwise comparisons of the two input
> +vectors is true.
> +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the first 
> and
> +second operands of the comparison, respectively.  Operand 3 is the len 
> operand.
> +Operand 4 is the else value for the masked operation.  Operand 5 is the
> +@code{code_label} to jump to.
> +
> +@cindex @code{cond_len_vec_cbranch_all@var{mode}} instruction pattern
> +@item @samp{cond_len_vec_cbranch_all@var{mode}}
> +Len based conditional branch instruction based on a vector compare that 
> branches
> +when all of the elementwise comparisons of the two input vectors is true.
> +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the first 
> and
> +second operands of the comparison, respectively.  Operand 3 is the len 
> operand.
> +Operand 4 is the else value for the masked operation.  Operand 5 is the
> +@code{code_label} to jump to.
> +
>  @cindex @code{jump} instruction pattern
>  @item @samp{jump}
>  A jump inside a function; an unconditional branch.  Operand 0 is the
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 
> 0865fc2e19aeb2b3056c8634334d6c1644a3cc96..1072239fef086e4ed959e472f299ed048fd507ad
>  100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -48,6 +48,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "langhooks.h"
>  #include "gimple.h"
>  #include "ssa.h"
> +#include "tree-ssa-live.h"
> +#include "tree-outof-ssa.h"
>
>  static void prepare_float_lib_cmp (rtx, rtx, enum rtx_code, rtx *,
>                                    machine_mode *);
> @@ -4405,6 +4407,9 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, 
> machine_mode extr_mode)
>
>     *PMODE is the mode of the inputs (in case they are const_int).
>
> +   *OPTAB is the optab to check for OPTAB_DIRECT support.  Defaults to
> +   cbranch_optab.
> +
>     This function performs all the setup necessary so that the caller only has
>     to emit a single comparison insn.  This setup can involve doing a BLKmode
>     comparison or emitting a library call to perform the comparison if no insn
> @@ -4414,9 +4419,9 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, 
> machine_mode extr_mode)
>     comparisons must have already been folded.  */
>
>  static void
> -prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
> +prepare_cmp_insn (rtx x, rtx y, rtx *mask, enum rtx_code comparison, rtx 
> size,
>                   int unsignedp, enum optab_methods methods,
> -                 rtx *ptest, machine_mode *pmode)
> +                 rtx *ptest, machine_mode *pmode, optab optab)
>  {
>    machine_mode mode = *pmode;
>    rtx libfunc, test;
> @@ -4534,7 +4539,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code 
> comparison, rtx size,
>    FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
>      {
>        enum insn_code icode;
> -      icode = optab_handler (cbranch_optab, cmp_mode);
> +      icode = optab_handler (optab, cmp_mode);
>        if (icode != CODE_FOR_nothing
>           && insn_operand_matches (icode, 0, test))
>         {
> @@ -4566,8 +4571,8 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code 
> comparison, rtx size,
>        /* Small trick if UNORDERED isn't implemented by the hardware.  */
>        if (comparison == UNORDERED && rtx_equal_p (x, y))
>         {
> -         prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, OPTAB_WIDEN,
> -                           ptest, pmode);
> +         prepare_cmp_insn (x, y, mask, UNLT, NULL_RTX, unsignedp, 
> OPTAB_WIDEN,
> +                           ptest, pmode, optab);
>           if (*ptest)
>             return;
>         }
> @@ -4618,8 +4623,8 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code 
> comparison, rtx size,
>         }
>
>        *pmode = ret_mode;
> -      prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods,
> -                       ptest, pmode);
> +      prepare_cmp_insn (x, y, mask, comparison, NULL_RTX, unsignedp, methods,
> +                       ptest, pmode, optab);
>      }
>
>    return;
> @@ -4657,9 +4662,10 @@ prepare_operand (enum insn_code icode, rtx x, int 
> opnum, machine_mode mode,
>     we can do the branch.  */
>
>  static void
> -emit_cmp_and_jump_insn_1 (rtx test, machine_mode mode, rtx label,
> -                         direct_optab cmp_optab, profile_probability prob,
> -                         bool test_branch)
> +emit_cmp_and_jump_insn_1 (rtx test, rtx cond, rtx inactive, machine_mode 
> mode,
> +                         rtx label, direct_optab cmp_optab,
> +                         profile_probability prob, bool test_branch,
> +                         bool len_op)
>  {
>    machine_mode optab_mode;
>    enum mode_class mclass;
> @@ -4672,12 +4678,20 @@ emit_cmp_and_jump_insn_1 (rtx test, machine_mode 
> mode, rtx label,
>
>    gcc_assert (icode != CODE_FOR_nothing);
>    gcc_assert (test_branch || insn_operand_matches (icode, 0, test));
> +  gcc_assert (cond == NULL_RTX || (cond != NULL_RTX && !test_branch));
>    if (test_branch)
>      insn = emit_jump_insn (GEN_FCN (icode) (XEXP (test, 0),
>                                             XEXP (test, 1), label));
> -  else
> +  else if (cond == NULL_RTX)
>      insn = emit_jump_insn (GEN_FCN (icode) (test, XEXP (test, 0),
>                                             XEXP (test, 1), label));
> +  else if (len_op)
> +    insn = emit_jump_insn (GEN_FCN (icode) (test, XEXP (test, 0),
> +                                           XEXP (test, 1), cond, inactive,
> +                                           label));
> +  else
> +    insn = emit_jump_insn (GEN_FCN (icode) (test, cond, XEXP (test, 0),
> +                                           XEXP (test, 1), inactive, label));
>
>    if (prob.initialized_p ()
>        && profile_status_for_fn (cfun) != PROFILE_ABSENT
> @@ -4796,22 +4810,202 @@ emit_cmp_and_jump_insns (rtx x, rtx y, enum rtx_code 
> comparison, rtx size,
>    if (unsignedp)
>      comparison = unsigned_condition (comparison);
>
> -  prepare_cmp_insn (op0, op1, comparison, size, unsignedp, OPTAB_LIB_WIDEN,
> -                   &test, &mode);
> +  /* cbranch is no longer allowed for vectors, so when using a vector mode
> +     check vec_cbranch variants instead.  */
> +  if (!VECTOR_MODE_P (GET_MODE (op0)))
> +    prepare_cmp_insn (op0, op1, NULL, comparison, size, unsignedp,
> +                     OPTAB_LIB_WIDEN, &test, &mode, cbranch_optab);
>
>    /* Check if we're comparing a truth type with 0, and if so check if
>       the target supports tbranch.  */
>    machine_mode tmode = mode;
>    direct_optab optab;
> -  if (op1 == CONST0_RTX (GET_MODE (op1))
> -      && validate_test_and_branch (val, &test, &tmode,
> -                                  &optab) != CODE_FOR_nothing)
> +  if (op1 == CONST0_RTX (GET_MODE (op1)))
>      {
> -      emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true);
> -      return;
> +      if (!VECTOR_MODE_P (GET_MODE (op1))
> +         && validate_test_and_branch (val, &test, &tmode,
> +                                      &optab) != CODE_FOR_nothing)
> +       {
> +         emit_cmp_and_jump_insn_1 (test, NULL_RTX, NULL_RTX, tmode, label,
> +                                   optab, prob, true, false);
> +         return;
> +       }
> +
> +      /* If we are comparing equality with 0, check if VAL is another 
> equality
> +        comparison and if the target supports it directly.  */
> +      gimple *def_stmt = NULL;
> +      if (val && TREE_CODE (val) == SSA_NAME
> +         && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> +         && (comparison == NE || comparison == EQ)
> +         && (def_stmt = get_gimple_for_ssa_name (val)))
> +       {
> +         tree masked_op = NULL_TREE;
> +         tree len_op = NULL_TREE;
> +         tree len_else_op = NULL_TREE;
> +         /* First determine if the operation should be masked or unmasked.  
> */
> +         if (is_gimple_assign (def_stmt)
> +             && gimple_assign_rhs_code (def_stmt) == BIT_AND_EXPR)
> +           {
> +             /* See if one side if a comparison, if so use the other side as
> +                the mask.  */
> +             gimple *mask_def = NULL;
> +             tree rhs1 = gimple_assign_rhs1 (def_stmt);
> +             tree rhs2 = gimple_assign_rhs2 (def_stmt);
> +             if ((mask_def = get_gimple_for_ssa_name (rhs1))
> +                 && is_gimple_assign (mask_def)
> +                 && TREE_CODE_CLASS (gimple_assign_rhs_code (mask_def)))
> +               masked_op = rhs2;
> +             else if ((mask_def = get_gimple_for_ssa_name (rhs2))
> +                 && is_gimple_assign (mask_def)
> +                 && TREE_CODE_CLASS (gimple_assign_rhs_code (mask_def)))
> +               masked_op = rhs1;
> +
> +             if (masked_op)
> +               def_stmt = mask_def;
> +           }
> +           /* Else check to see if we're a LEN target.  */
> +         else if (is_gimple_call (def_stmt)
> +                  && gimple_call_internal_p (def_stmt)
> +                  && gimple_call_internal_fn (def_stmt) == 
> IFN_VCOND_MASK_LEN)
> +           {
> +             /* Example to consume:
> +
> +                  a = _59 != vect__4.17_75;
> +                  vcmp = .VCOND_MASK_LEN (a, { -1, ... }, { 0, ... }, _90, 
> 0);
> +                  if (vcmp != { 0, ... })
> +
> +               and transform into
> +
> +                  if (cond_len_vec_cbranch_any (a, _90, 0)).  */
> +             gcall *call = dyn_cast <gcall *> (def_stmt);
> +             tree true_branch = gimple_call_arg (call, 1);
> +             tree false_branch = gimple_call_arg (call, 2);
> +             if (integer_minus_onep (true_branch)
> +                 && integer_zerop (false_branch))
> +               {
> +                 len_op = gimple_call_arg (call, 3);
> +                 len_else_op = gimple_call_arg (call, 4);
> +
> +                 def_stmt = SSA_NAME_DEF_STMT (gimple_call_arg (call, 0));
> +               }
> +           }
> +
> +         bool cond_op = masked_op || len_op;
> +         enum insn_code icode;
> +         if (is_gimple_assign (def_stmt)
> +             && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt))
> +                  == tcc_comparison)
> +           {
> +             class expand_operand ops[4];
> +             rtx_insn *tmp = NULL;
> +             start_sequence ();
> +             rtx op0c = expand_normal (gimple_assign_rhs1 (def_stmt));
> +             rtx op1c = expand_normal (gimple_assign_rhs2 (def_stmt));
> +             machine_mode mode2 = GET_MODE (op0c);
> +
> +             int nops = cond_op ? 4 : 2;
> +             int offset = masked_op ? 1 : 0;
> +             create_input_operand (&ops[offset + 0], op0c, mode2);
> +             create_input_operand (&ops[offset + 1], op1c, mode2);
> +             if (masked_op)
> +               {
> +                 rtx mask_op = expand_normal (masked_op);
> +                 auto mask_mode = GET_MODE (mask_op);
> +                 create_input_operand (&ops[0], mask_op, mask_mode);
> +                 create_input_operand (&ops[3], CONST0_RTX (mask_mode),
> +                                       mask_mode);
> +               }
> +             else if (len_op)
> +               {
> +                 rtx len_op2 = expand_normal (len_op);
> +                 rtx len_else_op2 = expand_normal (len_else_op);
> +                 create_input_operand (&ops[2], len_op2, GET_MODE (len_op2));
> +                 create_input_operand (&ops[3], len_else_op2,
> +                                       GET_MODE (len_else_op2));
> +               }
> +
> +             int unsignedp2 = TYPE_UNSIGNED (TREE_TYPE (val));
> +             auto inner_code = gimple_assign_rhs_code (def_stmt);
> +             rtx test2 = NULL_RTX;
> +
> +             enum rtx_code comparison2 = get_rtx_code (inner_code, 
> unsignedp2);
> +             if (unsignedp2)
> +               comparison2 = unsigned_condition (comparison2);
> +             if (comparison == NE)
> +               optab = masked_op ? cond_vec_cbranch_any_optab
> +                                 : len_op ? cond_len_vec_cbranch_any_optab
> +                                          : vec_cbranch_any_optab;
> +             else
> +               optab = masked_op ? cond_vec_cbranch_all_optab
> +                                 : len_op ? cond_len_vec_cbranch_all_optab
> +                                          : vec_cbranch_all_optab;
> +
> +             if ((icode = optab_handler (optab, mode2))
> +                 != CODE_FOR_nothing
> +                 && maybe_legitimize_operands (icode, 1, nops, ops))
> +               {
> +                 rtx cond = masked_op ? ops[0].value
> +                                      : len_op ? ops[2].value : NULL_RTX;
> +                 rtx inactive
> +                   = masked_op || len_op ? ops[3].value : NULL_RTX;
> +                 test2 = gen_rtx_fmt_ee (comparison2, VOIDmode,
> +                                         ops[offset + 0].value,
> +                                         ops[offset + 1].value);
> +                 if (insn_operand_matches (icode, 0, test2))
> +                   {
> +                     emit_cmp_and_jump_insn_1 (test2, cond, inactive, mode2,
> +                                               label, optab, prob, false,
> +                                               len_op);
> +                     tmp = get_insns ();
> +                   }
> +               }
> +
> +             end_sequence ();
> +             if (tmp)
> +               {
> +                 emit_insn (tmp);
> +                 return;
> +               }
> +           }
> +       }
> +    }
> +
> +  /*  cbranch should only be used for VECTOR_BOOLEAN_TYPE_P values.   */
> +  direct_optab base_optab = cbranch_optab;
> +  if (VECTOR_MODE_P (GET_MODE (op0)))
> +    {
> +      /* If cbranch is provided, use it.  If we get here it means we have an
> +        instruction in between what created the boolean value and the gcond
> +        that is not a masking operation.  This can happen for instance during
> +        unrolling of early-break where we have an OR-reduction to reduce the
> +        masks.  In this case knowing we have a mask can let us generate 
> better
> +        code.  If it's not there there then check the vector specific
> +        optabs.  */
> +      if (optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +       {
> +         if (comparison == NE)
> +           base_optab = vec_cbranch_any_optab;
> +         else
> +           base_optab = vec_cbranch_all_optab;
> +
> +         prepare_cmp_insn (op0, op1, NULL, comparison, size, unsignedp,
> +                           OPTAB_DIRECT, &test, &mode, base_optab);
> +
> +         enum insn_code icode = optab_handler (base_optab, mode);
> +
> +         /* If the new cbranch isn't supported, degrade back to old one.  */
> +         if (icode == CODE_FOR_nothing
> +             || !test
> +             || !insn_operand_matches (icode, 0, test))
> +           base_optab = cbranch_optab;
> +       }
> +
> +      prepare_cmp_insn (op0, op1, NULL, comparison, size, unsignedp,
> +                       OPTAB_LIB_WIDEN, &test, &mode, base_optab);
>      }
>
> -  emit_cmp_and_jump_insn_1 (test, mode, label, cbranch_optab, prob, false);
> +  emit_cmp_and_jump_insn_1 (test, NULL_RTX, NULL_RTX, mode, label, 
> base_optab,
> +                           prob, false, false);
>  }
>
>  /* Overloaded version of emit_cmp_and_jump_insns in which VAL is unknown.  */
> @@ -5099,9 +5293,9 @@ emit_conditional_move (rtx target, struct 
> rtx_comparison comp,
>               else if (rtx_equal_p (orig_op1, op3))
>                 op3p = XEXP (comparison, 1) = force_reg (cmpmode, orig_op1);
>             }
> -         prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> +         prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1), NULL,
>                             GET_CODE (comparison), NULL_RTX, unsignedp,
> -                           OPTAB_WIDEN, &comparison, &cmpmode);
> +                           OPTAB_WIDEN, &comparison, &cmpmode, 
> cbranch_optab);
>           if (comparison)
>             {
>                rtx res = emit_conditional_move_1 (target, comparison,
> @@ -5316,9 +5510,9 @@ emit_conditional_add (rtx target, enum rtx_code code, 
> rtx op0, rtx op1,
>
>    do_pending_stack_adjust ();
>    last = get_last_insn ();
> -  prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> -                    GET_CODE (comparison), NULL_RTX, unsignedp, OPTAB_WIDEN,
> -                    &comparison, &cmode);
> +  prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1), NULL,
> +                   GET_CODE (comparison), NULL_RTX, unsignedp, OPTAB_WIDEN,
> +                   &comparison, &cmode, cbranch_optab);
>    if (comparison)
>      {
>        class expand_operand ops[4];
> @@ -6132,8 +6326,8 @@ gen_cond_trap (enum rtx_code code, rtx op1, rtx op2, 
> rtx tcode)
>
>    do_pending_stack_adjust ();
>    start_sequence ();
> -  prepare_cmp_insn (op1, op2, code, NULL_RTX, false, OPTAB_DIRECT,
> -                   &trap_rtx, &mode);
> +  prepare_cmp_insn (op1, op2, NULL, code, NULL_RTX, false, OPTAB_DIRECT,
> +                   &trap_rtx, &mode, cbranch_optab);
>    if (!trap_rtx)
>      insn = NULL;
>    else
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> b6f290a95130cd53e94af2249c02a53f01ca3890..371514f3dbe41f1336475f99d1b837c24fa3b818
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -268,6 +268,8 @@ OPTAB_D (cond_fms_optab, "cond_fms$a")
>  OPTAB_D (cond_fnma_optab, "cond_fnma$a")
>  OPTAB_D (cond_fnms_optab, "cond_fnms$a")
>  OPTAB_D (cond_neg_optab, "cond_neg$a")
> +OPTAB_D (cond_vec_cbranch_any_optab, "cond_vec_cbranch_any$a")
> +OPTAB_D (cond_vec_cbranch_all_optab, "cond_vec_cbranch_all$a")
>  OPTAB_D (cond_one_cmpl_optab, "cond_one_cmpl$a")
>  OPTAB_D (cond_len_add_optab, "cond_len_add$a")
>  OPTAB_D (cond_len_sub_optab, "cond_len_sub$a")
> @@ -295,6 +297,8 @@ OPTAB_D (cond_len_fnma_optab, "cond_len_fnma$a")
>  OPTAB_D (cond_len_fnms_optab, "cond_len_fnms$a")
>  OPTAB_D (cond_len_neg_optab, "cond_len_neg$a")
>  OPTAB_D (cond_len_one_cmpl_optab, "cond_len_one_cmpl$a")
> +OPTAB_D (cond_len_vec_cbranch_any_optab, "cond_len_vec_cbranch_any$a")
> +OPTAB_D (cond_len_vec_cbranch_all_optab, "cond_len_vec_cbranch_all$a")
>  OPTAB_D (vcond_mask_len_optab, "vcond_mask_len_$a")
>  OPTAB_D (cstore_optab, "cstore$a4")
>  OPTAB_D (ctrap_optab, "ctrap$a4")
> @@ -427,6 +431,8 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3")
>  OPTAB_D (umulhs_optab, "umulhs$a3")
>  OPTAB_D (umulhrs_optab, "umulhrs$a3")
>  OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3")
> +OPTAB_D (vec_cbranch_any_optab, "vec_cbranch_any$a")
> +OPTAB_D (vec_cbranch_all_optab, "vec_cbranch_all$a")
>  OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
>  OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
>  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
>
>
> --



-- 
BR,
Hongtao

Re: [PATCH v2 1/3]middle-end: support new {cond{_len}_}vec_cbranch_{any|all} optabs [PR118974]

Reply via email to