https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156
--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Note I think GCC should be able to vectorize this loop but it goes wrong. SVE the 7 part gets lost: ``` vect__3.12_54 = .MASK_LOAD (_48, 16B, loop_mask_52); _32 = cond_17(D) + POLY_INT_CST [16, 16]; _25 = &MEM <vector([8,8]) short unsigned int> [(uint16_t *)_32 + ivtmp_77 * 2]; vect__3.13_56 = .MASK_LOAD (_25, 16B, loop_mask_53); _1 = &MEM <vector([16,16]) signed char> [(int8_t *)src_18(D) + ivtmp_77 * 1]; vect_pretmp_29.16_60 = .MASK_LOAD (_1, 8B, loop_mask_59); mask__14.19_66 = vect__3.12_54 > { 2, ... }; mask__14.19_67 = vect__3.13_56 > { 2, ... }; mask_patt_4.20_68 = VEC_PACK_TRUNC_EXPR <mask__14.19_66, mask__14.19_67>; vect_array.23 ={v} {CLOBBER}; vect_array.23[0] = vect_pretmp_29.16_60; vect_array.23[1] = vect_pretmp_29.16_60; vec_mask_and_74 = loop_mask_59 & mask_patt_4.20_68; _2 = ivtmp_77 * 2; _3 = &MEM <vector([16,16]) signed char[2]> [(int8_t *)dest_19(D) + _2 * 1]; ``` But RISCV is able to vectorize it correctly: ``` vect__3.12_52 = .MASK_LEN_LOAD (vectp_cond.10_13, 16B, { -1, ... }, _72, 0); vect_pretmp_29.15_56 = .MASK_LEN_LOAD (vectp_src.13_54, 8B, { -1, ... }, _72, 0); mask__27.16_58 = vect__3.12_52 <= { 7, ... }; .MASK_LEN_SCATTER_STORE (vectp_dest.17_60, { 0, 2, 4, ... }, 1, vect_pretmp_29.15_56, mask__27.16_58, _72, 0); mask__14.19_64 = vect__3.12_52 > { 2, ... }; .MASK_LEN_SCATTER_STORE (vectp_dest.20_67, { 0, 2, 4, ... }, 1, vect_pretmp_29.15_56, mask__14.19_64, _72, 0); ``` By using 2 stores and scatter here.