https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122219
--- Comment #16 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
For comment #8 with `-O2 -fstack-reuse=none -flifetime-dse=0` it takes more
than 2 LIM to do full store motion here.
Lim2 moves:
*v_10(D) = v__lsm.16_23; // v_14
Then sink moves:
v_14 = MEM[(union simde__m256_private *)&a_];
And then LIM4 moves:
MEM <simde__m256> [(char * {ref-all})&a_] = a___lsm.23_33;
MEM <uint128_t> [(union simde__m256_private *)&a_ + 16B] = a___lsm.24_23;
But still has:
MEM <uint128_t> [(union simde__m256_private *)&r_] = _15;
v_16 = MEM[(union simde__m256_private *)&r_];
a___lsm.23_21 = v_16;
_11 = VIEW_CONVERT_EXPR<uint128_t>(f1_9);
a___lsm.24_12 = _11;
Let me see if I can get testcase for this.