During the base register initialization, when we are eliminating the load instruction, we were calling `emit_move_insn` on registers of the same size but of different mode in some cases, causing an ICE.
This patch uses `lowpart_subreg` for the base register initialization, instead of zero-extending it. We had tried this solution before, but we were leaving undefined bytes in the upper part of the register. This shouldn't be happening as we are supposed to write the whole register when the load is eliminated. This was occurring when having multiple stores with the same offset as the load, generating a register move for all of them, overwriting the bit inserts that were inserted before them. With this patch we are generating a register move only for the first store of this kind, using a bit insert for the rest of them. Bootstrapped/regtested on AArch64 and x86_64. PR rtl-optimization/119884 gcc/ChangeLog: * avoid-store-forwarding.cc (process_store_forwarding): Use `lowpart_subreg` for the base register initialization, only for the first store that has the same offset as the load. gcc/testsuite/ChangeLog: * gcc.target/i386/pr119884.c: New test. --- gcc/avoid-store-forwarding.cc | 28 ++++++++++++++++++------ gcc/testsuite/gcc.target/i386/pr119884.c | 13 +++++++++++ 2 files changed, 34 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr119884.c diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc index ded8d7e596e0..90e6563e7b26 100644 --- a/gcc/avoid-store-forwarding.cc +++ b/gcc/avoid-store-forwarding.cc @@ -225,24 +225,35 @@ process_store_forwarding (vec<store_fwd_info> &stores, rtx_insn *load_insn, int move_to_front = -1; int total_cost = 0; + unsigned int curr_zero_offset_count = 0; + + /* Count the stores with zero offset. */ + unsigned int zero_offset_store_num = 0; + FOR_EACH_VEC_ELT (stores, i, it) + { + if (it->offset == 0) + zero_offset_store_num++; + } /* Check if we can emit bit insert instructions for all forwarded stores. */ FOR_EACH_VEC_ELT (stores, i, it) { it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem)); rtx_insn *insns = NULL; + const bool has_zero_offset = it->offset == 0; + const bool is_first_store_in_chain + = curr_zero_offset_count == (zero_offset_store_num - 1); /* If we're eliminating the load then find the store with zero offset - and use it as the base register to avoid a bit insert if possible. */ - if (load_elim && it->offset == 0) + and use it as the base register to avoid a bit insert if possible. + If there are multiple stores with zero offset, do it for the first + one only (the last in the reversed vector). */ + if (load_elim && has_zero_offset && is_first_store_in_chain) { start_sequence (); - machine_mode dest_mode = GET_MODE (dest); - rtx base_reg = it->mov_reg; - if (known_gt (GET_MODE_BITSIZE (dest_mode), - GET_MODE_BITSIZE (GET_MODE (it->mov_reg)))) - base_reg = gen_rtx_ZERO_EXTEND (dest_mode, it->mov_reg); + rtx base_reg = lowpart_subreg (GET_MODE (dest), it->mov_reg, + GET_MODE (it->mov_reg)); if (base_reg) { @@ -257,6 +268,9 @@ process_store_forwarding (vec<store_fwd_info> &stores, rtx_insn *load_insn, end_sequence (); } + if (has_zero_offset) + curr_zero_offset_count++; + if (!insns) insns = generate_bit_insert_sequence (&(*it), dest); diff --git a/gcc/testsuite/gcc.target/i386/pr119884.c b/gcc/testsuite/gcc.target/i386/pr119884.c new file mode 100644 index 000000000000..34d5b689244d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr119884.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-dse -favoid-store-forwarding" } */ + +typedef __attribute__((__vector_size__(64))) char V; +char c; +V v; + +char +foo() +{ + v *= c; + return v[0]; +} \ No newline at end of file -- 2.49.0