http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55653
Bug #: 55653 Summary: Unnecessary initialization of vector register Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: josh.m.con...@gmail.com When initializing all lanes of a vector register, I notice that the register is first initialized to zero and then all lanes of the vector are independently initialized, resulting in extra code. Specifically, I'm looking at the aarch64 target, with the following source: void fmla_loop (double * restrict result, double * restrict mul1, double mul2, int size) { int i; for (i = 0; i < size; i++) result[i] = result[i] + mul1[i] * mul2; } Compiled with: aarch64-linux-gnu-gcc -std=c99 -O3 -ftree-vectorize -S -o test.s test.c The resultant code to initialize a vector register with two instances of mul2 is: adr x3, .LC0 ld1 {v3.2d}, [x3] ins v3.d[0], v0.d[0] ins v3.d[1], v0.d[0] ... .LC0: .word 0 .word 0 .word 0 .word 0 Where the first two instructions (that initialize the vector register) are unnecessary, as is the space for .LC0. Note that this initialization is being performed here in store_constructor: /* Inform later passes that the old value is dead. */ if (!cleared && !vector && REG_P (target)) emit_move_insn (target, CONST0_RTX (GET_MODE (target))); right after another check to see if the vector needs to be cleared out (and determine that it doesn't). Instead of the emit_move_insn, that code used to be: emit_insn (gen_rtx_CLOBBER (VOIDmode, target)); But was changed in r101169, with the comment: "The expr.c change elides an extra move that's creeped in since we changed clobbered values to get new registers in reload." (see full checkin text here: http://gcc.gnu.org/ml/gcc-patches/2005-06/msg01584.html) It's not clear to me whether this can be changed back, or if later passes should be recognizing this initialization as redundant, or whether we need a new expand pattern to match vector fill (vector duplicate). At any rate, the code is certainly not ideal as it stands. Thanks!