https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122219

--- Comment #16 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
For comment #8 with `-O2 -fstack-reuse=none -flifetime-dse=0` it takes more
than 2 LIM to do full store motion here.

Lim2 moves:
  *v_10(D) = v__lsm.16_23; // v_14 
Then sink moves:
  v_14 = MEM[(union simde__m256_private *)&a_];

And then LIM4 moves:
  MEM <simde__m256> [(char * {ref-all})&a_] = a___lsm.23_33;
  MEM <uint128_t> [(union simde__m256_private *)&a_ + 16B] = a___lsm.24_23;

But still has:

  MEM <uint128_t> [(union simde__m256_private *)&r_] = _15;
  v_16 = MEM[(union simde__m256_private *)&r_];
  a___lsm.23_21 = v_16;
  _11 = VIEW_CONVERT_EXPR<uint128_t>(f1_9);
  a___lsm.24_12 = _11;

Let me see if I can get testcase for this.

Reply via email to