https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118974

--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <[email protected]>:

https://gcc.gnu.org/g:1c9d321611367608d6bc1d97cf35b4c1bcb4b2d1

commit r16-5585-g1c9d321611367608d6bc1d97cf35b4c1bcb4b2d1
Author: Tamar Christina <[email protected]>
Date:   Tue Nov 25 12:51:31 2025 +0000

    middle-end: support new {cond{_len}_}vec_cbranch_{any|all} optabs
[PR118974]

    This patch introduces six new vector cbranch optabs

    1. vec_cbranch_any and vec_cbranch_all.
    2. cond_vec_cbranch_any and cond_vec_cbranch_all.
    3. cond_len_vec_cbranch_any and cond_len_vec_cbranch_all.

    Today cbranch can be used for both vector and scalar modes.  In both these
    cases it's intended to compare boolean values, either scalar or vector.

    The optab documentation does not however state that it can only handle
    comparisons against 0.  So many targets have added code for the vector
variant
    that tries to deal with the case where we branch based on two non-zero
    registers.

    However this code can't ever be reached because the cbranch expansion only
deals
    with comparisons against 0 for vectors.  This is because for vectors the
rest of
    the compiler has no way to generate a non-zero comparison. e.g. the
vectorizer
    will always generate a zero comparison, and the C/C++ front-ends won't
allow
    vectors to be used in a cbranch as it expects a boolean value.  ISAs like
SVE
    work around this by requiring you to use an SVE PTEST intrinsics which
results
    in a single scalar boolean value that represents the flag values.

    e.g. if (svptest_any (..))

    The natural question is why do we not at expand time then rewrite the
comparison
    to a non-zero comparison if the target supports it.

    The reason is we can't safely do so.  For an ANY comparison (e.g. != b)
this is
    trivial, but for an ALL comparison (e.g. == b) we would have to flip both
branch
    and invert the value being compared.  i.e. we have to make it a != b
comparison.

    But in emit_cmp_and_jump_insns we can't flip the branches anymore because
they
    have already been lowered into a fall through branch (PC) and a label,
ready for
    use in an if_then_else RTL expression.

    Now why does any of this matter?  Well there are three optimizations we
want to be
    able to do.

    1. Adv. SIMD does not support a vector !=, as in there's no instruction for
it.
       For both Integer and FP vectors we perform the comparisons as EQ and
then
       invert the resulting mask.  Ideally we'd like to replace this with just
a XOR
       and the appropriate branch.

    2. When on an SVE enabled system we would like to use an SVE compare +
branch
       for the Adv. SIMD sequence which could happen due to cost modelling. 
However
       we can only do so based on if we know that the values being compared
against
       are the boolean masks.  This means we can't really use combine to do
this
       because combine would have to match the entire sequence including the
       vector comparisons because at RTL we've lost the information that
       VECTOR_BOOLEAN_P would have given us.  This sequence would be too long
for
       combine to match due to it having to match the compare + branch sequence
       being generated as well.  It also becomes a bit messy to match ANY and
ALL
       sequences.

    3. For SVE systems we would like to avoid generating the PTEST operation
       whenever possible.  Because SVE vector integer comparisons already set
flags
       we don't need the PTEST on an any or all check.  Eliminating this in RTL
is
       difficult, so the best approach is to not generate the PTEST at all when
not
       needed.

    To handle these three cases the new optabs are added and the current
cbranch is
    no longer required if the target does not need help in distinguishing
between
    boolean vector vs data vector operands.

    This difference is not important for correctness, but it is for
optimization.
    So I've chosen not to deprecate the cbranch_optab but make it completely
optional.

    I'll try to explain why:

    An example is when unrolling is done on Adv. SIMD early break loops.

    We generate

      vect__1.8_29 = MEM <vector(4) int> [(int *)_25];
      vect__1.9_31 = MEM <vector(4) int> [(int *)_25 + 16B];
      mask_patt_10.10_32 = vect__1.8_29 == { 124, 124, 124, 124 };
      mask_patt_10.10_33 = vect__1.9_31 == { 124, 124, 124, 124 };
      vexit_reduc_34 = .VEC_TRUNC_ADD_HIGH (mask_patt_10.10_33,
mask_patt_10.10_32);
      if (vexit_reduc_34 != { 0, 0, 0, 0 })
        goto <bb 4>; [5.50%]
      else
        goto <bb 18>; [94.50%]

    And so the new optabs aren't immediately useful because the comparisons
can't
    be done by the optab itself.

    As such vec_cbranch_any would be called with vexit_reduc_34 and { 0, 0, 0,
0 }
    however since this expects to perform the comparison itself we end up with

            ldp     q30, q31, [x0], 32
            cmeq    v30.4s, v30.4s, v27.4s
            cmeq    v31.4s, v31.4s, v27.4s
            addhn   v31.4h, v31.4s, v30.4s
            cmtst   v31.4h, v31.4h, v31.4h
            fmov    x3, d31
            cbz     x3, .L2

    instead of

            ldp     q30, q31, [x0], 32
            cmeq    v30.4s, v30.4s, v27.4s
            cmeq    v31.4s, v31.4s, v27.4s
            addhn   v31.4h, v31.4s, v30.4s
            fmov    x3, d31
            cbz     x3, .L2

    because we don't know that the value is already a boolean -1/0 value. 
Without
    this we can't safely not perform the compare.

    The conversion is needed because e.g. it's not valid to drop the compare
with
    zero when the vector just contains data:

    v30.8h = [ 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008 ]
    cmeq   v31.8h, v30.8h, #0        // -> v31.8h = [0,0,0,0,0,0,0,0]
    umaxp  v31.4s, v31.4s, v31.4s    // pairwise-OR over 0/FFFF masks -> still
[0,0,0,0]
    fmov   x7, d31                   // x7 = 0
    cbnz   x7, .L6                   // NOT taken (correct: there were no
zeros)

    vs

    umaxp v31.4s, v31.4s, v31.4s     // pairwise unsigned max:
                                     //   [
max(0x00020001,0x00040003)=0x00040003,
                                     //    
max(0x00060005,0x00080007)=0x00080007, ... ]
    fmov  x7, d31                    // x7 = 0x0008000700040003  (non-zero)
    cbnz  x7, .L66                   // TAKEN

    As such, to avoid the extra compare on boolean vectors, we still need the
    cbranch_optab or the new vec_cbranch_* optabs need an extre operand to
indicate
    what kind of data they hold.  Note that this isn't an issue for SVE because
    SVE has BImode for booleans.

    With these two optabs it's trivial to implement all the optimizations I
    described above.

    I.e. with them we can now generate

    .L2:
            ldr     q31, [x1, x2]
            add     v29.4s, v29.4s, v25.4s
            add     v28.4s, v28.4s, v26.4s
            add     v31.4s, v31.4s, v30.4s
            str     q31, [x1, x2]
            add     x1, x1, 16
            cmp     x1, 2560
            beq     .L1
    .L6:
            ldr     q30, [x3, x1]
            cmpeq   p15.s, p7/z, z30.s, z27.s
            b.none  .L2

    and easily prove it correct.

    gcc/ChangeLog:

            PR target/118974
            * optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab,
            cond_vec_cbranch_any_optab, cond_vec_cbranch_all_optab,
            cond_len_vec_cbranch_any_optab, cond_len_vec_cbranch_all_optab):
New.
            * doc/md.texi: Document them.
            * optabs.cc (prepare_cmp_insn): Refactor to take optab to check for
            instead of hardcoded cbranch and support mask and len.
            (emit_cmp_and_jump_insn_1, emit_cmp_and_jump_insns): Implement
them.
            (emit_conditional_move, emit_conditional_add, gen_cond_trap):
Update
            after changing function signatures to support new optabs.

Reply via email to