https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117850

--- Comment #2 from rguenther at suse dot de <rguenther at suse dot de> ---
> Am 29.11.2024 um 18:27 schrieb tnfchris at gcc dot gnu.org 
> <gcc-bugzi...@gcc.gnu.org>:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117850
> 
>            Bug ID: 117850
>           Summary: GCC emits DUP, UMULL instead of UMULL2
>           Product: gcc
>           Version: 15.0
>            Status: UNCONFIRMED
>          Keywords: missed-optimization
>          Severity: normal
>          Priority: P3
>         Component: target
>          Assignee: unassigned at gcc dot gnu.org
>          Reporter: tnfchris at gcc dot gnu.org
>                CC: rguenth at gcc dot gnu.org
>  Target Milestone: ---
>            Target: aarch64*
> 
> The following example:
> 
> #include <arm_neon.h>
> 
> uint16x8_t foo(const uint8x16_t s) {        
>    const uint8x16_t f0 = vdupq_n_u8(4);        
>    return vmull_u8(vget_high_u8(s), vget_high_u8(f0));
> }
> 
> compiled with -O3 generates:
> 
> foo(__Uint8x16_t):
>        movi    v31.8b, 0x4
>        dup     d0, v0.d[1]
>        umull   v0.8h, v0.8b, v31.8b
>        ret
> 
> instead of
> 
> foo(__Uint8x16_t):
>        movi    v1.16b, #4
>        umull2  v0.8h, v0.16b, v1.16b
>        ret
> 
> I think we can fix this an other cases by lowering them in GIMPLE.
> 
> concretely the above could be lowered to VEC_WIDEN_MUL and based on the
> BIT_FIELD_REFs generated by the vget_high's folded into the proper _lo or _hi
> variant.
> 
> To do this though we might need to expose valueize to the API so we can look 
> at
> the operands rather than having to chase up the SSA_NAME_DEF_STMT.
> 
> Are you ok with this Richi?

I dont See how this is easier?
> 
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Reply via email to