Hi, For the following test-case: #include <arm_neon.h>
uint8x8_t f1(int8x8_t a, int8x8_t b) {
return (uint8x8_t) ((a & b) != 0);
}
gcc fails to lower test operation to vtst, and instead emits:
f1:
vand d0, d0, d1
vceq.i8 d0, d0, #0
vmvn d0, d0
bx lr
The attached patch tries to fix this by adding a pattern to match this combine:
Trying 7, 8 -> 9:
7: r120:V8QI=r123:V8QI&r124:V8QI
REG_DEAD r124:V8QI
REG_DEAD r123:V8QI
8: r122:V8QI=-r120:V8QI==const_vector
REG_DEAD r120:V8QI
9: r121:V8QI=~r122:V8QI
REG_DEAD r122:V8QI
Failed to match this instruction:
(set (reg:V8QI 121)
(plus:V8QI (eq:V8QI (and:V8QI (reg:V8QI 123)
(reg:V8QI 124))
(const_vector:V8QI [
(const_int 0 [0]) repeated x8
]))
(const_vector:V8QI [
(const_int -1 [0xffffffffffffffff]) repeated x8
])))
Essentially it converts:
r120 = (and r123 r124)
r122 = (neg (eq r120 0))
r121 = (not r122)
-->
r121 = vtst r123, r124
(I guess it simplifies (not (neg X)) to (plus X -1) above).
Code-gen after patch:
f1:
vtst.8 d0, d0, d1
bx lr
Bootstrapped + tested on arm-linux-gnueabihf, and
cross tested on arm*-*-*.
Does it look OK for next stage-1 ?
Thanks,
Prathamesh
pr97903-1.diff
Description: Binary data
