On Tue, Nov 18, 2025 at 3:52 PM Tamar Christina <[email protected]> wrote:
>
> > -----Original Message-----
> > From: Hongtao Liu <[email protected]>
> > Sent: 18 November 2025 06:51
> > To: Tamar Christina <[email protected]>
> > Cc: [email protected]; nd <[email protected]>; [email protected]
> > Subject: Re: [PATCH v2 1/3]middle-end: support new
> > {cond{_len}_}vec_cbranch_{any|all} optabs [PR118974]
> >
> > On Tue, Nov 18, 2025 at 7:14 AM Tamar Christina
> > <[email protected]> wrote:
> > >
> > > This patch introduces six new vector cbranch optabs
> > >
> > > 1. vec_cbranch_any and vec_cbranch_all.
> > > 2. cond_vec_cbranch_any and cond_vec_cbranch_all.
> > > 3. cond_len_vec_cbranch_any and cond_len_vec_cbranch_all.
> > >
> > > Today cbranch can be used for both vector and scalar modes. In both these
> > > cases it's intended to compare boolean values, either scalar or vector.
> > >
> > > The optab documentation does not however state that it can only handle
> > > comparisons against 0. So many targets have added code for the vector
> > variant
> > > that tries to deal with the case where we branch based on two non-zero
> > > registers.
> > >
> > > However this code can't ever be reached because the cbranch expansion only
> > deals
> > > with comparisons against 0 for vectors. This is because for vectors the
> > > rest
> > of
> > > the compiler has no way to generate a non-zero comparison. e.g. the
> > vectorizer
> > > will always generate a zero comparison, and the C/C++ front-ends won't
> > allow
> > > vectors to be used in a cbranch as it expects a boolean value. ISAs like
> > > SVE
> > > work around this by requiring you to use an SVE PTEST intrinsics which
> > results
> > > in a single scalar boolean value that represents the flag values.
> > >
> > > e.g. if (svptest_any (..))
> > >
> > > The natural question is why do we not at expand time then rewrite the
> > comparison
> > > to a non-zero comparison if the target supports it.
> > >
> > > The reason is we can't safely do so. For an ANY comparison (e.g. != b)
> > > this is
> > > trivial, but for an ALL comparison (e.g. == b) we would have to flip both
> > branch
> > > and invert the value being compared. i.e. we have to make it a != b
> > comparison.
> > >
> > > But in emit_cmp_and_jump_insns we can't flip the branches anymore
> > because they
> > > have already been lowered into a fall through branch (PC) and a label,
> > > ready
> > for
> > > use in an if_then_else RTL expression.
> > >
> > > Now why does any of this matter? Well there are three optimizations we
> > want to be
> > > able to do.
> > >
> > > 1. Adv. SIMD does not support a vector !=, as in there's no instruction
> > > for it.
> > > For both Integer and FP vectors we perform the comparisons as EQ and
> > then
> > > invert the resulting mask. Ideally we'd like to replace this with
> > > just a XOR
> > > and the appropriate branch.
> > >
> > > 2. When on an SVE enabled system we would like to use an SVE compare +
> > branch
> > > for the Adv. SIMD sequence which could happen due to cost modelling.
> > However
> > > we can only do so based on if we know that the values being compared
> > against
> > > are the boolean masks. This means we can't really use combine to do
> > > this
> > > because combine would have to match the entire sequence including the
> > > vector comparisons because at RTL we've lost the information that
> > > VECTOR_BOOLEAN_P would have given us. This sequence would be too
> > long for
> > > combine to match due to it having to match the compare + branch
> > sequence
> > > being generated as well. It also becomes a bit messy to match ANY and
> > > ALL
> > > sequences.
> > >
> > > 3. For SVE systems we would like to avoid generating the PTEST operation
> > > whenever possible. Because SVE vector integer comparisons already set
> > flags
> > > we don't need the PTEST on an any or all check. Eliminating this in
> > > RTL is
> > > difficult, so the best approach is to not generate the PTEST at all
> > > when not
> > > needed.
> > >
> > > To handle these three cases the new optabs are added and the current
> > cbranch is
> > > no longer required if the target does not need help in distinguishing
> > > between
> > > boolean vector vs data vector operands.
> > >
> > > This difference is not important for correctness, but it is for
> > > optimization.
> > > So I've chosen not to deprecate the cbranch_optab but make it completely
> > optional.
> > >
> > > I'll try to explain why:
> > >
> > > An example is when unrolling is done on Adv. SIMD early break loops.
> > >
> > > We generate
> > >
> > > vect__1.8_29 = MEM <vector(4) int> [(int *)_25];
> > > vect__1.9_31 = MEM <vector(4) int> [(int *)_25 + 16B];
> > > mask_patt_10.10_32 = vect__1.8_29 == { 124, 124, 124, 124 };
> > > mask_patt_10.10_33 = vect__1.9_31 == { 124, 124, 124, 124 };
> > > vexit_reduc_34 = .VEC_TRUNC_ADD_HIGH (mask_patt_10.10_33,
> > mask_patt_10.10_32);
> > > if (vexit_reduc_34 != { 0, 0, 0, 0 })
> > > goto <bb 4>; [5.50%]
> > > else
> > > goto <bb 18>; [94.50%]
> > >
> > > And so the new optabs aren't immediately useful because the comparisons
> > can't
> > > be done by the optab itself.
> > >
> > > As such vec_cbranch_any would be called with vexit_reduc_34 and { 0, 0, 0,
> > 0 }
> > > however since this expects to perform the comparison itself we end up with
> > >
> > > ldp q30, q31, [x0], 32
> > > cmeq v30.4s, v30.4s, v27.4s
> > > cmeq v31.4s, v31.4s, v27.4s
> > > addhn v31.4h, v31.4s, v30.4s
> > > cmtst v31.4h, v31.4h, v31.4h
> > > fmov x3, d31
> > > cbz x3, .L2
> > >
> > > instead of
> > >
> > > ldp q30, q31, [x0], 32
> > > cmeq v30.4s, v30.4s, v27.4s
> > > cmeq v31.4s, v31.4s, v27.4s
> > > addhn v31.4h, v31.4s, v30.4s
> > > fmov x3, d31
> > > cbz x3, .L2
> > >
> > > because we don't know that the value is already a boolean -1/0 value.
> > Without
> > > this we can't safely not perform the compare.
> > >
> > > The conversion is needed because e.g. it's not valid to drop the compare
> > > with
> > > zero when the vector just contains data:
> > >
> > > v30.8h = [ 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007,
> > 0x0008 ]
> > > cmeq v31.8h, v30.8h, #0 // -> v31.8h = [0,0,0,0,0,0,0,0]
> > > umaxp v31.4s, v31.4s, v31.4s // pairwise-OR over 0/FFFF masks -> still
> > [0,0,0,0]
> > > fmov x7, d31 // x7 = 0
> > > cbnz x7, .L6 // NOT taken (correct: there were no
> > > zeros)
> > >
> > > vs
> > >
> > > umaxp v31.4s, v31.4s, v31.4s // pairwise unsigned max:
> > > // [
> > > max(0x00020001,0x00040003)=0x00040003,
> > > //
> > > max(0x00060005,0x00080007)=0x00080007, ... ]
> > > fmov x7, d31 // x7 = 0x0008000700040003 (non-zero)
> > > cbnz x7, .L66 // TAKEN
> > >
> > > As such, to avoid the extra compare on boolean vectors, we still need the
> > > cbranch_optab or the new vec_cbranch_* optabs need an extre operand to
> > indicate
> > > what kind of data they hold. Note that this isn't an issue for SVE
> > > because
> > > SVE has BImode for booleans.
> > >
> > > With these two optabs it's trivial to implement all the optimizations I
> > > described above.
> > >
> > > I.e. with them we can now generate
> > >
> > > .L2:
> > > ldr q31, [x1, x2]
> > > add v29.4s, v29.4s, v25.4s
> > > add v28.4s, v28.4s, v26.4s
> > > add v31.4s, v31.4s, v30.4s
> > > str q31, [x1, x2]
> > > add x1, x1, 16
> > > cmp x1, 2560
> > > beq .L1
> > > .L6:
> > > ldr q30, [x3, x1]
> > > cmpeq p15.s, p7/z, z30.s, z27.s
> > > b.none .L2
> > >
> > > and easily prove it correct.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > -m32, -m64 and no issues.
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/118974
> > > * optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab,
> > > cond_vec_cbranch_any_optab, cond_vec_cbranch_all_optab,
> > > cond_len_vec_cbranch_any_optab, cond_len_vec_cbranch_all_optab):
> > New.
> > > * doc/md.texi: Document them.
> > > * optabs.cc (prepare_cmp_insn): Refactor to take optab to check
> > > for
> > > instead of hardcoded cbranch and support mask and len.
> > > (emit_cmp_and_jump_insn_1, emit_cmp_and_jump_insns): Implement
> > them.
> > > (emit_conditional_move, emit_conditional_add, gen_cond_trap):
> > Update
> > > after changing function signatures to support new optabs.
> > >
> > > ---
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > > index
> > ae5d709bd47945272e6f45f83840e21c68bb6534..e668048a387e146b072
> > d414168c5ed6db3707609 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -7664,8 +7664,65 @@ position of Operand 1 to test. Operand 3 is the
> > @code{code_label} to jump to.
> > > Conditional branch instruction combined with a compare instruction.
> > > Operand 0 is a comparison operator. Operand 1 and operand 2 are the
> > > first and second operands of the comparison, respectively. Operand 3
> > > +is the @code{code_label} to jump to. This optab is only used for
> > comparisons of
> > > +VECTOR_BOOLEAN_TYPE_P values and it never called for data-registers.
> > Data
> > > +vector operands should use one of the patterns below instead.
> > > +
> > > +@cindex @code{vec_cbranch_any@var{mode}} instruction pattern
> > > +@item @samp{vec_cbranch_any@var{mode}}
> > > +Conditional branch instruction based on a vector compare that branches
> > > +when at least one of the elementwise comparisons of the two input
> > > +vectors is true.
> > > +Operand 0 is a comparison operator. Operand 1 and operand 2 are the
> > > +first and second operands of the comparison, respectively. Operand 3
> > > is the @code{code_label} to jump to.
> > >
> > > +@cindex @code{vec_cbranch_all@var{mode}} instruction pattern
> > > +@item @samp{vec_cbranch_all@var{mode}}
> > > +Conditional branch instruction based on a vector compare that branches
> > > +when all of the elementwise comparisons of the two input vectors is true.
> > > +Operand 0 is a comparison operator. Operand 1 and operand 2 are the
> > > +first and second operands of the comparison, respectively. Operand 3
> > > +is the @code{code_label} to jump to.
> > > +
> > > +@cindex @code{cond_vec_cbranch_any@var{mode}} instruction pattern
> > > +@item @samp{cond_vec_cbranch_any@var{mode}}
> > > +Masked conditional branch instruction based on a vector compare that
> > branches
> > > +when at least one of the elementwise comparisons of the two input
> > > +vectors is true.
> > > +Operand 0 is a comparison operator. Operand 1 is the mask operand.
> > > +Operand 2 and operand 3 are the first and second operands of the
> > comparison,
> > > +respectively. Operand 4 is the else value for the masked operation.
> > > +Operand 5 is the @code{code_label} to jump to.
> >
> > Hello, I'd like to confirm that operand[4] is also a mask with the
> > same mode as operands[1], and each corresponding bit represents its
> > else value?
> > I notice the aarch64 backend patch defines operands[4] as
> > aarch64_simd_imm_zero, but it can be any mask, right?
> >
>
> Hi, yeah that's right. That's a good indication I need to add that to the
> docs.
> Now however I only try a vector of all zeros in expand atm, mainly because I
> don't
> know where to get the other possible else values for non-Len based targets.
> There's a target hook preferred_else_value which different targets seem to
> have interpreted slightly different I think.. AArch64 seems to have
> interpreted
> it as "given a conditional operation which value is the else value" whereas
> RISC-V seems to have done the same, but defaults to zero if the operation is
> unconditional.
>
> And the default implementation is default_preferred_else_value returns 0.
Thanks for the explanation. The default being 0 is reasonable for x86
as well, though x86 also supports leaving the dest to be assigned by
cond_op unchanged by default.
I have no further questions.
>
> So open to suggestions here.
>
> Thanks,
> Tamar
>
> > > +
> > > +@cindex @code{cond_vec_cbranch_all@var{mode}} instruction pattern
> > > +@item @samp{cond_vec_cbranch_all@var{mode}}
> > > +Masked conditional branch instruction based on a vector compare that
> > branches
> > > +when all of the elementwise comparisons of the two input vectors is true.
> > > +Operand 0 is a comparison operator. Operand 1 is the mask operand.
> > > +Operand 2 and operand 3 are the first and second operands of the
> > comparison,
> > > +respectively. Operand 4 is the else value for the masked operation.
> > > +Operand 5 is the @code{code_label} to jump to.
> > > +
> > > +@cindex @code{cond_len_vec_cbranch_any@var{mode}} instruction
> > pattern
> > > +@item @samp{cond_len_vec_cbranch_any@var{mode}}
> > > +Len based conditional branch instruction based on a vector compare that
> > branches
> > > +when at least one of the elementwise comparisons of the two input
> > > +vectors is true.
> > > +Operand 0 is a comparison operator. Operand 1 and operand 2 are the
> > > first
> > and
> > > +second operands of the comparison, respectively. Operand 3 is the len
> > operand.
> > > +Operand 4 is the else value for the masked operation. Operand 5 is the
> > > +@code{code_label} to jump to.
> > > +
> > > +@cindex @code{cond_len_vec_cbranch_all@var{mode}} instruction
> > pattern
> > > +@item @samp{cond_len_vec_cbranch_all@var{mode}}
> > > +Len based conditional branch instruction based on a vector compare that
> > branches
> > > +when all of the elementwise comparisons of the two input vectors is true.
> > > +Operand 0 is a comparison operator. Operand 1 and operand 2 are the
> > > first
> > and
> > > +second operands of the comparison, respectively. Operand 3 is the len
> > operand.
> > > +Operand 4 is the else value for the masked operation. Operand 5 is the
> > > +@code{code_label} to jump to.
> > > +
> > > @cindex @code{jump} instruction pattern
> > > @item @samp{jump}
> > > A jump inside a function; an unconditional branch. Operand 0 is the
> > > diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > > index
> > 0865fc2e19aeb2b3056c8634334d6c1644a3cc96..1072239fef086e4ed959e
> > 472f299ed048fd507ad 100644
> > > --- a/gcc/optabs.cc
> > > +++ b/gcc/optabs.cc
> > > @@ -48,6 +48,8 @@ along with GCC; see the file COPYING3. If not see
> > > #include "langhooks.h"
> > > #include "gimple.h"
> > > #include "ssa.h"
> > > +#include "tree-ssa-live.h"
> > > +#include "tree-outof-ssa.h"
> > >
> > > static void prepare_float_lib_cmp (rtx, rtx, enum rtx_code, rtx *,
> > > machine_mode *);
> > > @@ -4405,6 +4407,9 @@ can_vec_extract_var_idx_p (machine_mode
> > vec_mode, machine_mode extr_mode)
> > >
> > > *PMODE is the mode of the inputs (in case they are const_int).
> > >
> > > + *OPTAB is the optab to check for OPTAB_DIRECT support. Defaults to
> > > + cbranch_optab.
> > > +
> > > This function performs all the setup necessary so that the caller
> > > only has
> > > to emit a single comparison insn. This setup can involve doing a
> > > BLKmode
> > > comparison or emitting a library call to perform the comparison if no
> > > insn
> > > @@ -4414,9 +4419,9 @@ can_vec_extract_var_idx_p (machine_mode
> > vec_mode, machine_mode extr_mode)
> > > comparisons must have already been folded. */
> > >
> > > static void
> > > -prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
> > > +prepare_cmp_insn (rtx x, rtx y, rtx *mask, enum rtx_code comparison, rtx
> > size,
> > > int unsignedp, enum optab_methods methods,
> > > - rtx *ptest, machine_mode *pmode)
> > > + rtx *ptest, machine_mode *pmode, optab optab)
> > > {
> > > machine_mode mode = *pmode;
> > > rtx libfunc, test;
> > > @@ -4534,7 +4539,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > comparison, rtx size,
> > > FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
> > > {
> > > enum insn_code icode;
> > > - icode = optab_handler (cbranch_optab, cmp_mode);
> > > + icode = optab_handler (optab, cmp_mode);
> > > if (icode != CODE_FOR_nothing
> > > && insn_operand_matches (icode, 0, test))
> > > {
> > > @@ -4566,8 +4571,8 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > comparison, rtx size,
> > > /* Small trick if UNORDERED isn't implemented by the hardware. */
> > > if (comparison == UNORDERED && rtx_equal_p (x, y))
> > > {
> > > - prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, OPTAB_WIDEN,
> > > - ptest, pmode);
> > > + prepare_cmp_insn (x, y, mask, UNLT, NULL_RTX, unsignedp,
> > OPTAB_WIDEN,
> > > + ptest, pmode, optab);
> > > if (*ptest)
> > > return;
> > > }
> > > @@ -4618,8 +4623,8 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > comparison, rtx size,
> > > }
> > >
> > > *pmode = ret_mode;
> > > - prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods,
> > > - ptest, pmode);
> > > + prepare_cmp_insn (x, y, mask, comparison, NULL_RTX, unsignedp,
> > methods,
> > > + ptest, pmode, optab);
> > > }
> > >
> > > return;
> > > @@ -4657,9 +4662,10 @@ prepare_operand (enum insn_code icode, rtx x,
> > int opnum, machine_mode mode,
> > > we can do the branch. */
> > >
> > > static void
> > > -emit_cmp_and_jump_insn_1 (rtx test, machine_mode mode, rtx label,
> > > - direct_optab cmp_optab, profile_probability
> > > prob,
> > > - bool test_branch)
> > > +emit_cmp_and_jump_insn_1 (rtx test, rtx cond, rtx inactive,
> > machine_mode mode,
> > > + rtx label, direct_optab cmp_optab,
> > > + profile_probability prob, bool test_branch,
> > > + bool len_op)
> > > {
> > > machine_mode optab_mode;
> > > enum mode_class mclass;
> > > @@ -4672,12 +4678,20 @@ emit_cmp_and_jump_insn_1 (rtx test,
> > machine_mode mode, rtx label,
> > >
> > > gcc_assert (icode != CODE_FOR_nothing);
> > > gcc_assert (test_branch || insn_operand_matches (icode, 0, test));
> > > + gcc_assert (cond == NULL_RTX || (cond != NULL_RTX && !test_branch));
> > > if (test_branch)
> > > insn = emit_jump_insn (GEN_FCN (icode) (XEXP (test, 0),
> > > XEXP (test, 1), label));
> > > - else
> > > + else if (cond == NULL_RTX)
> > > insn = emit_jump_insn (GEN_FCN (icode) (test, XEXP (test, 0),
> > > XEXP (test, 1), label));
> > > + else if (len_op)
> > > + insn = emit_jump_insn (GEN_FCN (icode) (test, XEXP (test, 0),
> > > + XEXP (test, 1), cond,
> > > inactive,
> > > + label));
> > > + else
> > > + insn = emit_jump_insn (GEN_FCN (icode) (test, cond, XEXP (test, 0),
> > > + XEXP (test, 1), inactive,
> > > label));
> > >
> > > if (prob.initialized_p ()
> > > && profile_status_for_fn (cfun) != PROFILE_ABSENT
> > > @@ -4796,22 +4810,202 @@ emit_cmp_and_jump_insns (rtx x, rtx y,
> > enum rtx_code comparison, rtx size,
> > > if (unsignedp)
> > > comparison = unsigned_condition (comparison);
> > >
> > > - prepare_cmp_insn (op0, op1, comparison, size, unsignedp,
> > OPTAB_LIB_WIDEN,
> > > - &test, &mode);
> > > + /* cbranch is no longer allowed for vectors, so when using a vector
> > > mode
> > > + check vec_cbranch variants instead. */
> > > + if (!VECTOR_MODE_P (GET_MODE (op0)))
> > > + prepare_cmp_insn (op0, op1, NULL, comparison, size, unsignedp,
> > > + OPTAB_LIB_WIDEN, &test, &mode, cbranch_optab);
> > >
> > > /* Check if we're comparing a truth type with 0, and if so check if
> > > the target supports tbranch. */
> > > machine_mode tmode = mode;
> > > direct_optab optab;
> > > - if (op1 == CONST0_RTX (GET_MODE (op1))
> > > - && validate_test_and_branch (val, &test, &tmode,
> > > - &optab) != CODE_FOR_nothing)
> > > + if (op1 == CONST0_RTX (GET_MODE (op1)))
> > > {
> > > - emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true);
> > > - return;
> > > + if (!VECTOR_MODE_P (GET_MODE (op1))
> > > + && validate_test_and_branch (val, &test, &tmode,
> > > + &optab) != CODE_FOR_nothing)
> > > + {
> > > + emit_cmp_and_jump_insn_1 (test, NULL_RTX, NULL_RTX, tmode,
> > label,
> > > + optab, prob, true, false);
> > > + return;
> > > + }
> > > +
> > > + /* If we are comparing equality with 0, check if VAL is another
> > > equality
> > > + comparison and if the target supports it directly. */
> > > + gimple *def_stmt = NULL;
> > > + if (val && TREE_CODE (val) == SSA_NAME
> > > + && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> > > + && (comparison == NE || comparison == EQ)
> > > + && (def_stmt = get_gimple_for_ssa_name (val)))
> > > + {
> > > + tree masked_op = NULL_TREE;
> > > + tree len_op = NULL_TREE;
> > > + tree len_else_op = NULL_TREE;
> > > + /* First determine if the operation should be masked or
> > > unmasked. */
> > > + if (is_gimple_assign (def_stmt)
> > > + && gimple_assign_rhs_code (def_stmt) == BIT_AND_EXPR)
> > > + {
> > > + /* See if one side if a comparison, if so use the other
> > > side as
> > > + the mask. */
> > > + gimple *mask_def = NULL;
> > > + tree rhs1 = gimple_assign_rhs1 (def_stmt);
> > > + tree rhs2 = gimple_assign_rhs2 (def_stmt);
> > > + if ((mask_def = get_gimple_for_ssa_name (rhs1))
> > > + && is_gimple_assign (mask_def)
> > > + && TREE_CODE_CLASS (gimple_assign_rhs_code (mask_def)))
> > > + masked_op = rhs2;
> > > + else if ((mask_def = get_gimple_for_ssa_name (rhs2))
> > > + && is_gimple_assign (mask_def)
> > > + && TREE_CODE_CLASS (gimple_assign_rhs_code (mask_def)))
> > > + masked_op = rhs1;
> > > +
> > > + if (masked_op)
> > > + def_stmt = mask_def;
> > > + }
> > > + /* Else check to see if we're a LEN target. */
> > > + else if (is_gimple_call (def_stmt)
> > > + && gimple_call_internal_p (def_stmt)
> > > + && gimple_call_internal_fn (def_stmt) ==
> > IFN_VCOND_MASK_LEN)
> > > + {
> > > + /* Example to consume:
> > > +
> > > + a = _59 != vect__4.17_75;
> > > + vcmp = .VCOND_MASK_LEN (a, { -1, ... }, { 0, ... },
> > > _90, 0);
> > > + if (vcmp != { 0, ... })
> > > +
> > > + and transform into
> > > +
> > > + if (cond_len_vec_cbranch_any (a, _90, 0)). */
> > > + gcall *call = dyn_cast <gcall *> (def_stmt);
> > > + tree true_branch = gimple_call_arg (call, 1);
> > > + tree false_branch = gimple_call_arg (call, 2);
> > > + if (integer_minus_onep (true_branch)
> > > + && integer_zerop (false_branch))
> > > + {
> > > + len_op = gimple_call_arg (call, 3);
> > > + len_else_op = gimple_call_arg (call, 4);
> > > +
> > > + def_stmt = SSA_NAME_DEF_STMT (gimple_call_arg (call,
> > > 0));
> > > + }
> > > + }
> > > +
> > > + bool cond_op = masked_op || len_op;
> > > + enum insn_code icode;
> > > + if (is_gimple_assign (def_stmt)
> > > + && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt))
> > > + == tcc_comparison)
> > > + {
> > > + class expand_operand ops[4];
> > > + rtx_insn *tmp = NULL;
> > > + start_sequence ();
> > > + rtx op0c = expand_normal (gimple_assign_rhs1 (def_stmt));
> > > + rtx op1c = expand_normal (gimple_assign_rhs2 (def_stmt));
> > > + machine_mode mode2 = GET_MODE (op0c);
> > > +
> > > + int nops = cond_op ? 4 : 2;
> > > + int offset = masked_op ? 1 : 0;
> > > + create_input_operand (&ops[offset + 0], op0c, mode2);
> > > + create_input_operand (&ops[offset + 1], op1c, mode2);
> > > + if (masked_op)
> > > + {
> > > + rtx mask_op = expand_normal (masked_op);
> > > + auto mask_mode = GET_MODE (mask_op);
> > > + create_input_operand (&ops[0], mask_op, mask_mode);
> > > + create_input_operand (&ops[3], CONST0_RTX (mask_mode),
> > > + mask_mode);
> > > + }
> > > + else if (len_op)
> > > + {
> > > + rtx len_op2 = expand_normal (len_op);
> > > + rtx len_else_op2 = expand_normal (len_else_op);
> > > + create_input_operand (&ops[2], len_op2, GET_MODE
> > > (len_op2));
> > > + create_input_operand (&ops[3], len_else_op2,
> > > + GET_MODE (len_else_op2));
> > > + }
> > > +
> > > + int unsignedp2 = TYPE_UNSIGNED (TREE_TYPE (val));
> > > + auto inner_code = gimple_assign_rhs_code (def_stmt);
> > > + rtx test2 = NULL_RTX;
> > > +
> > > + enum rtx_code comparison2 = get_rtx_code (inner_code,
> > unsignedp2);
> > > + if (unsignedp2)
> > > + comparison2 = unsigned_condition (comparison2);
> > > + if (comparison == NE)
> > > + optab = masked_op ? cond_vec_cbranch_any_optab
> > > + : len_op ?
> > > cond_len_vec_cbranch_any_optab
> > > + : vec_cbranch_any_optab;
> > > + else
> > > + optab = masked_op ? cond_vec_cbranch_all_optab
> > > + : len_op ?
> > > cond_len_vec_cbranch_all_optab
> > > + : vec_cbranch_all_optab;
> > > +
> > > + if ((icode = optab_handler (optab, mode2))
> > > + != CODE_FOR_nothing
> > > + && maybe_legitimize_operands (icode, 1, nops, ops))
> > > + {
> > > + rtx cond = masked_op ? ops[0].value
> > > + : len_op ? ops[2].value : NULL_RTX;
> > > + rtx inactive
> > > + = masked_op || len_op ? ops[3].value : NULL_RTX;
> > > + test2 = gen_rtx_fmt_ee (comparison2, VOIDmode,
> > > + ops[offset + 0].value,
> > > + ops[offset + 1].value);
> > > + if (insn_operand_matches (icode, 0, test2))
> > > + {
> > > + emit_cmp_and_jump_insn_1 (test2, cond, inactive,
> > > mode2,
> > > + label, optab, prob, false,
> > > + len_op);
> > > + tmp = get_insns ();
> > > + }
> > > + }
> > > +
> > > + end_sequence ();
> > > + if (tmp)
> > > + {
> > > + emit_insn (tmp);
> > > + return;
> > > + }
> > > + }
> > > + }
> > > + }
> > > +
> > > + /* cbranch should only be used for VECTOR_BOOLEAN_TYPE_P values. */
> > > + direct_optab base_optab = cbranch_optab;
> > > + if (VECTOR_MODE_P (GET_MODE (op0)))
> > > + {
> > > + /* If cbranch is provided, use it. If we get here it means we
> > > have an
> > > + instruction in between what created the boolean value and the
> > > gcond
> > > + that is not a masking operation. This can happen for instance
> > > during
> > > + unrolling of early-break where we have an OR-reduction to reduce
> > > the
> > > + masks. In this case knowing we have a mask can let us generate
> > > better
> > > + code. If it's not there there then check the vector specific
> > > + optabs. */
> > > + if (optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > > + {
> > > + if (comparison == NE)
> > > + base_optab = vec_cbranch_any_optab;
> > > + else
> > > + base_optab = vec_cbranch_all_optab;
> > > +
> > > + prepare_cmp_insn (op0, op1, NULL, comparison, size, unsignedp,
> > > + OPTAB_DIRECT, &test, &mode, base_optab);
> > > +
> > > + enum insn_code icode = optab_handler (base_optab, mode);
> > > +
> > > + /* If the new cbranch isn't supported, degrade back to old one.
> > > */
> > > + if (icode == CODE_FOR_nothing
> > > + || !test
> > > + || !insn_operand_matches (icode, 0, test))
> > > + base_optab = cbranch_optab;
> > > + }
> > > +
> > > + prepare_cmp_insn (op0, op1, NULL, comparison, size, unsignedp,
> > > + OPTAB_LIB_WIDEN, &test, &mode, base_optab);
> > > }
> > >
> > > - emit_cmp_and_jump_insn_1 (test, mode, label, cbranch_optab, prob,
> > false);
> > > + emit_cmp_and_jump_insn_1 (test, NULL_RTX, NULL_RTX, mode, label,
> > base_optab,
> > > + prob, false, false);
> > > }
> > >
> > > /* Overloaded version of emit_cmp_and_jump_insns in which VAL is
> > unknown. */
> > > @@ -5099,9 +5293,9 @@ emit_conditional_move (rtx target, struct
> > rtx_comparison comp,
> > > else if (rtx_equal_p (orig_op1, op3))
> > > op3p = XEXP (comparison, 1) = force_reg (cmpmode,
> > > orig_op1);
> > > }
> > > - prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> > > + prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> > NULL,
> > > GET_CODE (comparison), NULL_RTX, unsignedp,
> > > - OPTAB_WIDEN, &comparison, &cmpmode);
> > > + OPTAB_WIDEN, &comparison, &cmpmode,
> > > cbranch_optab);
> > > if (comparison)
> > > {
> > > rtx res = emit_conditional_move_1 (target, comparison,
> > > @@ -5316,9 +5510,9 @@ emit_conditional_add (rtx target, enum rtx_code
> > code, rtx op0, rtx op1,
> > >
> > > do_pending_stack_adjust ();
> > > last = get_last_insn ();
> > > - prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> > > - GET_CODE (comparison), NULL_RTX, unsignedp,
> > > OPTAB_WIDEN,
> > > - &comparison, &cmode);
> > > + prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1), NULL,
> > > + GET_CODE (comparison), NULL_RTX, unsignedp,
> > > OPTAB_WIDEN,
> > > + &comparison, &cmode, cbranch_optab);
> > > if (comparison)
> > > {
> > > class expand_operand ops[4];
> > > @@ -6132,8 +6326,8 @@ gen_cond_trap (enum rtx_code code, rtx op1, rtx
> > op2, rtx tcode)
> > >
> > > do_pending_stack_adjust ();
> > > start_sequence ();
> > > - prepare_cmp_insn (op1, op2, code, NULL_RTX, false, OPTAB_DIRECT,
> > > - &trap_rtx, &mode);
> > > + prepare_cmp_insn (op1, op2, NULL, code, NULL_RTX, false,
> > OPTAB_DIRECT,
> > > + &trap_rtx, &mode, cbranch_optab);
> > > if (!trap_rtx)
> > > insn = NULL;
> > > else
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > index
> > b6f290a95130cd53e94af2249c02a53f01ca3890..371514f3dbe41f1336475
> > f99d1b837c24fa3b818 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -268,6 +268,8 @@ OPTAB_D (cond_fms_optab, "cond_fms$a")
> > > OPTAB_D (cond_fnma_optab, "cond_fnma$a")
> > > OPTAB_D (cond_fnms_optab, "cond_fnms$a")
> > > OPTAB_D (cond_neg_optab, "cond_neg$a")
> > > +OPTAB_D (cond_vec_cbranch_any_optab, "cond_vec_cbranch_any$a")
> > > +OPTAB_D (cond_vec_cbranch_all_optab, "cond_vec_cbranch_all$a")
> > > OPTAB_D (cond_one_cmpl_optab, "cond_one_cmpl$a")
> > > OPTAB_D (cond_len_add_optab, "cond_len_add$a")
> > > OPTAB_D (cond_len_sub_optab, "cond_len_sub$a")
> > > @@ -295,6 +297,8 @@ OPTAB_D (cond_len_fnma_optab,
> > "cond_len_fnma$a")
> > > OPTAB_D (cond_len_fnms_optab, "cond_len_fnms$a")
> > > OPTAB_D (cond_len_neg_optab, "cond_len_neg$a")
> > > OPTAB_D (cond_len_one_cmpl_optab, "cond_len_one_cmpl$a")
> > > +OPTAB_D (cond_len_vec_cbranch_any_optab,
> > "cond_len_vec_cbranch_any$a")
> > > +OPTAB_D (cond_len_vec_cbranch_all_optab,
> > "cond_len_vec_cbranch_all$a")
> > > OPTAB_D (vcond_mask_len_optab, "vcond_mask_len_$a")
> > > OPTAB_D (cstore_optab, "cstore$a4")
> > > OPTAB_D (ctrap_optab, "ctrap$a4")
> > > @@ -427,6 +431,8 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3")
> > > OPTAB_D (umulhs_optab, "umulhs$a3")
> > > OPTAB_D (umulhrs_optab, "umulhrs$a3")
> > > OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3")
> > > +OPTAB_D (vec_cbranch_any_optab, "vec_cbranch_any$a")
> > > +OPTAB_D (vec_cbranch_all_optab, "vec_cbranch_all$a")
> > > OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
> > > OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
> > > OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
> > >
> > >
> > > --
> >
> >
> >
> > --
> > BR,
> > Hongtao
--
BR,
Hongtao