https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117850
--- Comment #2 from rguenther at suse dot de <rguenther at suse dot de> --- > Am 29.11.2024 um 18:27 schrieb tnfchris at gcc dot gnu.org > <gcc-bugzi...@gcc.gnu.org>: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117850 > > Bug ID: 117850 > Summary: GCC emits DUP, UMULL instead of UMULL2 > Product: gcc > Version: 15.0 > Status: UNCONFIRMED > Keywords: missed-optimization > Severity: normal > Priority: P3 > Component: target > Assignee: unassigned at gcc dot gnu.org > Reporter: tnfchris at gcc dot gnu.org > CC: rguenth at gcc dot gnu.org > Target Milestone: --- > Target: aarch64* > > The following example: > > #include <arm_neon.h> > > uint16x8_t foo(const uint8x16_t s) { > const uint8x16_t f0 = vdupq_n_u8(4); > return vmull_u8(vget_high_u8(s), vget_high_u8(f0)); > } > > compiled with -O3 generates: > > foo(__Uint8x16_t): > movi v31.8b, 0x4 > dup d0, v0.d[1] > umull v0.8h, v0.8b, v31.8b > ret > > instead of > > foo(__Uint8x16_t): > movi v1.16b, #4 > umull2 v0.8h, v0.16b, v1.16b > ret > > I think we can fix this an other cases by lowering them in GIMPLE. > > concretely the above could be lowered to VEC_WIDEN_MUL and based on the > BIT_FIELD_REFs generated by the vget_high's folded into the proper _lo or _hi > variant. > > To do this though we might need to expose valueize to the API so we can look > at > the operands rather than having to chase up the SSA_NAME_DEF_STMT. > > Are you ok with this Richi? I dont See how this is easier? > > -- > You are receiving this mail because: > You are on the CC list for the bug.