https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91272
Bug ID: 91272 Summary: [SVE] Use fully-masked loops for CLASTB reductions Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Target: aarch64-linux-gnu Tests like clastb_6.c show that we don't yet support CLASTB reductions in fully-masked/predicated loops. E.g.: the main loop is: .L3: ld1w z1.s, p0/z, [x0, x2, lsl 2] addpl x1, x2, #2 incw x2 fcmlt p1.s, p0/z, z1.s, z3.s cmp w2, w3 clastb s0, p1, s0, z1.s bls .L3 This loop operates on full vectors only and relies on a scalar loop to handle the rest. We should instead support fully-masked loops by ANDing the comparison result in a CLASTB reduction with the loop mask. I think this means: * making vectorizable_condition apply vect_get_loop_mask for reductions. (There might be cases we want to do this for normal conditions as well as for reductions, but that's separate work). * relaxing the LOOP_VINFO_CAN_FULLY_MASK_P handling in vectorizable_reduction to account for the above.