https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97903

            Bug ID: 97903
           Summary: [ARM NEON] Missed optimization in lowering test
                    operation
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: prathamesh3492 at gcc dot gnu.org
  Target Milestone: ---

Hi,
For the following test-case:

#include <arm_neon.h>

uint8x8_t f1(int8x8_t a, int8x8_t b) {
  return (uint8x8_t) ((a & b) != 0);
}

uint8x8_t f2(int8x8_t a, int8x8_t b) {
  return vtst_s8 (a, b);
}

Code-gen:

f2:
        vtst.8  d0, d0, d1
        bx      lr


f1:
        vmov.i32        d16, #0  @ v8qi
        vand    d1, d0, d1
        vmov.i32        d17, #0xffffffff  @ v8qi
        vceq.i8 d1, d1, d16
        vbsl    d1, d16, d17
        vmov    d0, d1  @ v8qi
        bx      lr

The optimized dump for f1 shows:
  _1 = a_4(D) & b_5(D);
  _3 = .VCOND (_1, { 0, 0, 0, 0, 0, 0, 0, 0 }, { -1, -1, -1, -1, -1, -1, -1, -1
}, { 0, 0, 0, 0, 0, 0, 0, 0 }, 113);
  _6 = VIEW_CONVERT_EXPR<uint8x8_t>(_3);

I think we miss opportunity to combine AND followed by VCOND into a vector test
instruction. Should we add a .VTEST internal function that expands to vtst ? Or
alternatively, add a peephole pattern in backend ?

Thanks,
Prathamesh

Reply via email to