https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88027
Bug ID: 88027 Summary: PowerPC generates slightly weird code for memset Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: meissner at gcc dot gnu.org Target Milestone: --- If the PowerPC GCC compiler is doing a memset operation to clear some aligned memory, it will do most of the stores using vector stores, but the last quad word will be done using GPR stores: Consider the following code: struct st { vector double a[4]; }; long bar (struct st *p) { __builtin_memset ((void *) p, '\0', sizeof (struct st)); return 0; } GCC -O3 -mcpu=power9 generates: bar: xxspltib 0,0 ; 0 in fpr0 (aka vsr0) li 9,0 ; 0 in gpr9 std 9,48(3) ; store the last 2 double words as GPRs std 9,56(3) stxv 0,0(3) ; store the first 3 quad words as vectors stxv 0,16(3) stxv 0,32(3) blr GCC -O3 -mcpu=power8 generates: bar: vspltisw 0,0 ; 0 in v0 (aka vsr32) li 9,0 ; 0 in gpr9 li 8,16 ; index for 2nd quad word li 10,32 ; index for 3rd quad word xxpermdi 12,32,32,2 ; word swap (should be optimized out) std 9,48(3) ; store last doubleword -1 as GPR stxvd2x 12,0,3 ; store first quad word as vector stxvd2x 12,3,8 ; store second quad word as vector std 9,56(3) ; store last double word as GPR stxvd2x 12,3,10 ; store third quad word as vector blr In addition to switching between storing as GPRs and as vectors, some machines prefer the stores to be in ascending order for better optimization.