As noted in the last patch, rs6000_rtx_costs ought to cost slow unaligned mems. This stops combine merging loads/stores with a mode-changing SET subreg, if the load/store in the subreg mode would be slow. Costing slow mems at 100 insns is just an order of magnitude estimate. (The alignment interrupt does cost quite a lot. Experiments on power8 with a misaligned lwarx showed taking the alignment interrupt cost roughly 300 insns.)
Boostrapped and regression tested powerpc64le-linux and powerpc64-linux. * config/rs6000/rs6000.c (rs6000_rtx_costs): Make unaligned mem cost more. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 5b9aae2..2ae3e7e 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -34336,11 +34336,16 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, case CONST: case HIGH: case SYMBOL_REF: + *total = !speed ? COSTS_N_INSNS (1) + 1 : COSTS_N_INSNS (2); + return true; + case MEM: /* When optimizing for size, MEM should be slightly more expensive than generating address, e.g., (plus (reg) (const)). L1 cache latency is about two instructions. */ *total = !speed ? COSTS_N_INSNS (1) + 1 : COSTS_N_INSNS (2); + if (SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (x))) + *total += COSTS_N_INSNS (100); return true; case LABEL_REF: -- Alan Modra Australia Development Lab, IBM