https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91272

            Bug ID: 91272
           Summary: [SVE] Use fully-masked loops for CLASTB reductions
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsandifo at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64-linux-gnu

Tests like clastb_6.c show that we don't yet support CLASTB
reductions in fully-masked/predicated loops.  E.g.: the main
loop is:

.L3:
        ld1w    z1.s, p0/z, [x0, x2, lsl 2]
        addpl   x1, x2, #2
        incw    x2
        fcmlt   p1.s, p0/z, z1.s, z3.s
        cmp     w2, w3
        clastb  s0, p1, s0, z1.s
        bls     .L3

This loop operates on full vectors only and relies on a scalar
loop to handle the rest.

We should instead support fully-masked loops by ANDing the
comparison result in a CLASTB reduction with the loop mask.
I think this means:

* making vectorizable_condition apply vect_get_loop_mask
  for reductions.  (There might be cases we want to do this
  for normal conditions as well as for reductions, but that's
  separate work).

* relaxing the LOOP_VINFO_CAN_FULLY_MASK_P handling in
  vectorizable_reduction to account for the above.

Reply via email to