Konstantinos Eleftheriou <konstantinos.elefther...@vrull.eu> writes: > During the base register initialization, when we are eliminating the load > instruction, we were calling `emit_move_insn` on registers of the same > size but of different mode in some cases, causing an ICE. > > This patch uses `lowpart_subreg` for the base register initialization, > instead of zero-extending it. We had tried this solution before, but > we were leaving undefined bytes in the upper part of the register. > This shouldn't be happening as we are supposed to write the whole > register when the load is eliminated. This was occurring when having > multiple stores with the same offset as the load, generating a > register move for all of them, overwriting the bit inserts that > were inserted before them. With this patch we are generating a register > move only for the first store of this kind, using a bit insert for the > rest of them.
That feels wrong though. If there are multiple stores to the same offset then it becomes a question of which bytes of which stores survive until the load. E.g. for a QI store followed by an HI store followed by an SI store, the final SI store wins and the previous ones should be ignored. If it's QI, SI, HI, then for little endian, the low two bytes come from the HI and the next two bytes come from the SI. The QI store should again be ignored. So I would expect this to depend on which store is widest, with ties broken by picking later stores (i.e. those earlier in the list). I'm also not sure why this is only a problem with using lowparts. Wouldn't the same issue apply when using zero_extend? The bytes are fully-defined for zero_extend, but not necessarily to the right values. Did you consider making the main: /* Check if we can emit bit insert instructions for all forwarded stores. */ FOR_EACH_VEC_ELT (stores, i, it) { loop also maintain a bitmap of bytes that still need to be forwarded, so that we can skip stores that contribute nothing? Later RTL optimisations might not be able to work that out in all cases, and in any case it would be good to avoid redundant operations given that we have the information to hand. Thanks, Richard > > Bootstrapped/regtested on AArch64 and x86_64. > > PR rtl-optimization/119884 > > gcc/ChangeLog: > > * avoid-store-forwarding.cc (process_store_forwarding): > Use `lowpart_subreg` for the base register initialization, > only for the first store that has the same offset as the > load. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr119884.c: New test. > --- > gcc/avoid-store-forwarding.cc | 28 ++++++++++++++++++------ > gcc/testsuite/gcc.target/i386/pr119884.c | 13 +++++++++++ > 2 files changed, 34 insertions(+), 7 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr119884.c > > diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc > index ded8d7e596e0..90e6563e7b26 100644 > --- a/gcc/avoid-store-forwarding.cc > +++ b/gcc/avoid-store-forwarding.cc > @@ -225,24 +225,35 @@ process_store_forwarding (vec<store_fwd_info> &stores, > rtx_insn *load_insn, > > int move_to_front = -1; > int total_cost = 0; > + unsigned int curr_zero_offset_count = 0; > + > + /* Count the stores with zero offset. */ > + unsigned int zero_offset_store_num = 0; > + FOR_EACH_VEC_ELT (stores, i, it) > + { > + if (it->offset == 0) > + zero_offset_store_num++; > + } > > /* Check if we can emit bit insert instructions for all forwarded stores. > */ > FOR_EACH_VEC_ELT (stores, i, it) > { > it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem)); > rtx_insn *insns = NULL; > + const bool has_zero_offset = it->offset == 0; > + const bool is_first_store_in_chain > + = curr_zero_offset_count == (zero_offset_store_num - 1); > > /* If we're eliminating the load then find the store with zero offset > - and use it as the base register to avoid a bit insert if possible. */ > - if (load_elim && it->offset == 0) > + and use it as the base register to avoid a bit insert if possible. > + If there are multiple stores with zero offset, do it for the first > + one only (the last in the reversed vector). */ > + if (load_elim && has_zero_offset && is_first_store_in_chain) > { > start_sequence (); > > - machine_mode dest_mode = GET_MODE (dest); > - rtx base_reg = it->mov_reg; > - if (known_gt (GET_MODE_BITSIZE (dest_mode), > - GET_MODE_BITSIZE (GET_MODE (it->mov_reg)))) > - base_reg = gen_rtx_ZERO_EXTEND (dest_mode, it->mov_reg); > + rtx base_reg = lowpart_subreg (GET_MODE (dest), it->mov_reg, > + GET_MODE (it->mov_reg)); > > if (base_reg) > { > @@ -257,6 +268,9 @@ process_store_forwarding (vec<store_fwd_info> &stores, > rtx_insn *load_insn, > end_sequence (); > } > > + if (has_zero_offset) > + curr_zero_offset_count++; > + > if (!insns) > insns = generate_bit_insert_sequence (&(*it), dest); > > diff --git a/gcc/testsuite/gcc.target/i386/pr119884.c > b/gcc/testsuite/gcc.target/i386/pr119884.c > new file mode 100644 > index 000000000000..34d5b689244d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr119884.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-dse -favoid-store-forwarding" } */ > + > +typedef __attribute__((__vector_size__(64))) char V; > +char c; > +V v; > + > +char > +foo() > +{ > + v *= c; > + return v[0]; > +} > \ No newline at end of file