https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105816
Robin Dapp <rdapp at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rdapp at gcc dot gnu.org
--- Comment #4 from Robin Dapp <rdapp at gcc dot gnu.org> ---
> where we build the vector from scalars (and fail to reject this via costing):
>
> _1 = BIT_FIELD_REF <src1_9(D), 32, 0>;
> _2 = BIT_FIELD_REF <src1_9(D), 32, 32>;
> _3 = BIT_FIELD_REF <src1_9(D), 32, 64>;
> _4 = BIT_FIELD_REF <src1_9(D), 32, 96>;
> _5 = BIT_FIELD_REF <src2_16(D), 32, 0>;
> _6 = BIT_FIELD_REF <src2_16(D), 32, 32>;
> _7 = BIT_FIELD_REF <src2_16(D), 32, 64>;
> _8 = BIT_FIELD_REF <src2_16(D), 32, 96>;
> _21 = {_1, _2, _3, _4, _5, _6, _7, _8};
> vectp.4_22 = &BIT_FIELD_REF <*dst_11(D), 32, 0>;
>
> t.c:6:13: note: Cost model analysis for part in loop 0:
> Vector cost: 48
> Scalar cost: 96
> t.c:6:13: note: Basic block will be vectorized using SLP
>
> Thus re-confirmed.
If I'm not confused I have a patch for this that enhances the constructor from
bit-field-ref optimization to two sources. We only handle one right now.
This also happens in x264 and after "propagation" it just decays to a simple
permute (or none even). Right now we only see it in forwprop when it's too
late. Intended to send it during stage 1.