https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65456
--- Comment #12 from Bill Schmidt <wschmidt at gcc dot gnu.org> --- The problem is this declaration in rs6000.h, which forces unaligned vector stores to be scalarized during expand: /* Define this macro to be the value 1 if unaligned accesses have a cost many times greater than aligned accesses, for example if they are emulated in a trap handler. */ /* Altivec vector memory instructions simply ignore the low bits; SPE vector memory instructions trap on unaligned accesses; VSX memory instructions are aligned to 4 or 8 bytes. */ #define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) \ (STRICT_ALIGNMENT \ || (((MODE) == SFmode || (MODE) == DFmode || (MODE) == TFmode \ || (MODE) == SDmode || (MODE) == DDmode || (MODE) == TDmode) \ && (ALIGN) < 32) \ || (VECTOR_MODE_P ((MODE)) && (((int)(ALIGN)) < VECTOR_ALIGN (MODE)))) The last condition needs to be relaxed for POWER8 hardware.