https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2019-01-14 CC| |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- I think there's related bugs. foo1 is optimized OK: y_4 = BIT_INSERT_EXPR <x_2(D), f_3(D), 0 (32 bits)>; return y_4; while foo is expanded from <bb 2> [local count: 1073741824]: _1 = BIT_FIELD_REF <x_7(D), 32, 32>; _2 = BIT_FIELD_REF <x_7(D), 32, 64>; _3 = BIT_FIELD_REF <x_7(D), 32, 96>; y_6 = {f_5(D), _1, _2, _3}; return y_6; tree forwprop contains code pattern-matching on vector CONSTRUCTORs, it could be extended to handle this case I think. IIRC it can detect arbitrary two-vector permutes already, for the above we could go through an intermediate _1 = {f_5(D), f_5(D), ... }; y_6 = VEC_PERM <_1, x_7(D), { .... }>; and recognize permutes that only replace a single vector element. So I think we should optimize __v4sf foo (__v4sf x, float f) { __v4sf y = __extension__ (__v4sf) { f, x[2], x[1], x[3] }; return y; } as well, first permuting x and then inserting f (at any position).