https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118974
--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Tamar Christina <[email protected]>: https://gcc.gnu.org/g:1c9d321611367608d6bc1d97cf35b4c1bcb4b2d1 commit r16-5585-g1c9d321611367608d6bc1d97cf35b4c1bcb4b2d1 Author: Tamar Christina <[email protected]> Date: Tue Nov 25 12:51:31 2025 +0000 middle-end: support new {cond{_len}_}vec_cbranch_{any|all} optabs [PR118974] This patch introduces six new vector cbranch optabs 1. vec_cbranch_any and vec_cbranch_all. 2. cond_vec_cbranch_any and cond_vec_cbranch_all. 3. cond_len_vec_cbranch_any and cond_len_vec_cbranch_all. Today cbranch can be used for both vector and scalar modes. In both these cases it's intended to compare boolean values, either scalar or vector. The optab documentation does not however state that it can only handle comparisons against 0. So many targets have added code for the vector variant that tries to deal with the case where we branch based on two non-zero registers. However this code can't ever be reached because the cbranch expansion only deals with comparisons against 0 for vectors. This is because for vectors the rest of the compiler has no way to generate a non-zero comparison. e.g. the vectorizer will always generate a zero comparison, and the C/C++ front-ends won't allow vectors to be used in a cbranch as it expects a boolean value. ISAs like SVE work around this by requiring you to use an SVE PTEST intrinsics which results in a single scalar boolean value that represents the flag values. e.g. if (svptest_any (..)) The natural question is why do we not at expand time then rewrite the comparison to a non-zero comparison if the target supports it. The reason is we can't safely do so. For an ANY comparison (e.g. != b) this is trivial, but for an ALL comparison (e.g. == b) we would have to flip both branch and invert the value being compared. i.e. we have to make it a != b comparison. But in emit_cmp_and_jump_insns we can't flip the branches anymore because they have already been lowered into a fall through branch (PC) and a label, ready for use in an if_then_else RTL expression. Now why does any of this matter? Well there are three optimizations we want to be able to do. 1. Adv. SIMD does not support a vector !=, as in there's no instruction for it. For both Integer and FP vectors we perform the comparisons as EQ and then invert the resulting mask. Ideally we'd like to replace this with just a XOR and the appropriate branch. 2. When on an SVE enabled system we would like to use an SVE compare + branch for the Adv. SIMD sequence which could happen due to cost modelling. However we can only do so based on if we know that the values being compared against are the boolean masks. This means we can't really use combine to do this because combine would have to match the entire sequence including the vector comparisons because at RTL we've lost the information that VECTOR_BOOLEAN_P would have given us. This sequence would be too long for combine to match due to it having to match the compare + branch sequence being generated as well. It also becomes a bit messy to match ANY and ALL sequences. 3. For SVE systems we would like to avoid generating the PTEST operation whenever possible. Because SVE vector integer comparisons already set flags we don't need the PTEST on an any or all check. Eliminating this in RTL is difficult, so the best approach is to not generate the PTEST at all when not needed. To handle these three cases the new optabs are added and the current cbranch is no longer required if the target does not need help in distinguishing between boolean vector vs data vector operands. This difference is not important for correctness, but it is for optimization. So I've chosen not to deprecate the cbranch_optab but make it completely optional. I'll try to explain why: An example is when unrolling is done on Adv. SIMD early break loops. We generate vect__1.8_29 = MEM <vector(4) int> [(int *)_25]; vect__1.9_31 = MEM <vector(4) int> [(int *)_25 + 16B]; mask_patt_10.10_32 = vect__1.8_29 == { 124, 124, 124, 124 }; mask_patt_10.10_33 = vect__1.9_31 == { 124, 124, 124, 124 }; vexit_reduc_34 = .VEC_TRUNC_ADD_HIGH (mask_patt_10.10_33, mask_patt_10.10_32); if (vexit_reduc_34 != { 0, 0, 0, 0 }) goto <bb 4>; [5.50%] else goto <bb 18>; [94.50%] And so the new optabs aren't immediately useful because the comparisons can't be done by the optab itself. As such vec_cbranch_any would be called with vexit_reduc_34 and { 0, 0, 0, 0 } however since this expects to perform the comparison itself we end up with ldp q30, q31, [x0], 32 cmeq v30.4s, v30.4s, v27.4s cmeq v31.4s, v31.4s, v27.4s addhn v31.4h, v31.4s, v30.4s cmtst v31.4h, v31.4h, v31.4h fmov x3, d31 cbz x3, .L2 instead of ldp q30, q31, [x0], 32 cmeq v30.4s, v30.4s, v27.4s cmeq v31.4s, v31.4s, v27.4s addhn v31.4h, v31.4s, v30.4s fmov x3, d31 cbz x3, .L2 because we don't know that the value is already a boolean -1/0 value. Without this we can't safely not perform the compare. The conversion is needed because e.g. it's not valid to drop the compare with zero when the vector just contains data: v30.8h = [ 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008 ] cmeq v31.8h, v30.8h, #0 // -> v31.8h = [0,0,0,0,0,0,0,0] umaxp v31.4s, v31.4s, v31.4s // pairwise-OR over 0/FFFF masks -> still [0,0,0,0] fmov x7, d31 // x7 = 0 cbnz x7, .L6 // NOT taken (correct: there were no zeros) vs umaxp v31.4s, v31.4s, v31.4s // pairwise unsigned max: // [ max(0x00020001,0x00040003)=0x00040003, // max(0x00060005,0x00080007)=0x00080007, ... ] fmov x7, d31 // x7 = 0x0008000700040003 (non-zero) cbnz x7, .L66 // TAKEN As such, to avoid the extra compare on boolean vectors, we still need the cbranch_optab or the new vec_cbranch_* optabs need an extre operand to indicate what kind of data they hold. Note that this isn't an issue for SVE because SVE has BImode for booleans. With these two optabs it's trivial to implement all the optimizations I described above. I.e. with them we can now generate .L2: ldr q31, [x1, x2] add v29.4s, v29.4s, v25.4s add v28.4s, v28.4s, v26.4s add v31.4s, v31.4s, v30.4s str q31, [x1, x2] add x1, x1, 16 cmp x1, 2560 beq .L1 .L6: ldr q30, [x3, x1] cmpeq p15.s, p7/z, z30.s, z27.s b.none .L2 and easily prove it correct. gcc/ChangeLog: PR target/118974 * optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab, cond_vec_cbranch_any_optab, cond_vec_cbranch_all_optab, cond_len_vec_cbranch_any_optab, cond_len_vec_cbranch_all_optab): New. * doc/md.texi: Document them. * optabs.cc (prepare_cmp_insn): Refactor to take optab to check for instead of hardcoded cbranch and support mask and len. (emit_cmp_and_jump_insn_1, emit_cmp_and_jump_insns): Implement them. (emit_conditional_move, emit_conditional_add, gen_cond_trap): Update after changing function signatures to support new optabs.
