https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74585
--- Comment #11 from Bill Schmidt <wschmidt at gcc dot gnu.org> --- With the original test case, -mcpu=power8 is problematic because of the use of the "swapping stores," whose RHS is a vec_select rather than a register or subreg. This prevents us from saving the RHS of the store for use in replacing subsequent loads, running afoul of this logic in dse.c:record_store (): if (GET_CODE (body) == SET /* No place to keep the value after ra. */ && !reload_completed && (REG_P (SET_SRC (body)) <= this part || GET_CODE (SET_SRC (body)) == SUBREG || CONSTANT_P (SET_SRC (body))) && !MEM_VOLATILE_P (mem) /* Sometimes the store and reload is used for truncation and rounding. */ && !(FLOAT_MODE_P (GET_MODE (mem)) && (flag_float_store))) We can circumvent this if we can use stvx to force the parameters to the stack, which is legal since the stack slots are properly aligned. However, even using -mcpu=power9, we don't handle removing the stores and replacing the partial loads with register logic.