https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124649
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Created attachment 64039 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64039&action=edit Reduced C testcase Note I think there is a missing optimization too. With: vect__3.11_44 = .MASK_LOAD (c.1_16, 32B, { -1, 0, ... }, { 0, ... }); vect__6.12_45 = (vector([4,4]) char) vect__3.11_44; .MASK_STORE (g_21(D), 8B, { -1, 0, ... }, vect__6.12_45); Most likely this should be optimized into the scalar load followed by a scalar store. But that is for a different issue.
