Hi Jeff,

On Fri, Nov 28, 2025 at 8:26 PM Jeff Law <[email protected]> wrote:

>
>
> On 11/27/25 1:56 AM, Konstantinos Eleftheriou wrote:
> >
> >
> > On Thu, Nov 27, 2025 at 2:15 AM Jeff Law <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >
> >
> >     On 11/19/25 7:37 AM, Konstantinos Eleftheriou wrote:
> >      > Sometimes, `store_bit_field` copies the destination register into
> >     a new one,
> >      > which leads to the old register being used in the instructions
> >     that follow
> >      > the ones generated by `store_bit_field`, while the bit field
> >     insertion is
> >      > performed on the new register.
> >      >
> >      > This patch copies back the new destination register into the old
> >     one when
> >      > needed.
> >      >
> >      > gcc/ChangeLog:
> >      >
> >      >          * avoid-store-forwarding.cc
> (generate_bit_insert_sequence):
> >      >       Copy back the new destination register into the old one
> >     when needed.
> >     This sounds more like a bit in store_bit_field.  It's stated purpose
> is
> >     to store the source field into an object.  I'd really like to
> >     understand
> >     in detail the scenario in which str_rtx does not contain the right
> >     value
> >     after a call to store_bit_field.
> >
> >
> > We were getting regressions in these testcases w/o this patch:
> > # * gcc: gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_1.c -O3 -
> > fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-
> > functions execution test
> > # * gcc: gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_1.c -O3 -g
> > execution test
> > # * gcc: gcc.target/aarch64/vldN_lane_1.c execution test
> >
> > For bf16_vldN_lane_1.c for example, this ASF transformation is triggered:
> > ```
> > Store forwarding detected:
> > From: (insn 35 64 36 2 (set (mem/c:V4SI (plus:DI (reg/f:DI 64 sfp)
> >                  (const_int -16 [0xfffffffffffffff0])) [0 MEM
> > <char[1:32]> [(void *)&vectors]+16 S16 A128])
> >          (reg:V4SI 143 [ MEM <char[1:32]> [(void *)data_22(D)]+16 ]))
> > "../testcase.c":21:20 discrim 1 1333 {*aarch64_simd_movv4si}
> >       (expr_list:REG_DEAD (reg:V4SI 143 [ MEM <char[1:32]> [(void
> > *)data_22(D)]+16 ])
> >          (nil)))
> > To: (insn 41 40 42 2 (set (reg:V4x4BF 60 v28)
> >          (mem/c:V4x4BF (plus:DI (reg/f:DI 64 sfp)
> >                  (const_int -32 [0xffffffffffffffe0])) [2 vectors+0 S32
> > A128])) 4838 {*aarch64_movv4x4bf}
> >       (nil))
> > Store forwarding avoided with bit inserts:
> > With sequence:
> >    (insn 345 0 346 (set (reg:V4x4BF 185)
> >          (reg:V4x4BF 60 v28)) 4838 {*aarch64_movv4x4bf}
> >       (nil))
> >    (insn 346 345 347 (set (subreg:DI (reg:V4x4BF 185) 16)
> >          (subreg:DI (reg:V4SI 184) 0)) 110 {*movdi_aarch64}
> >       (nil))
> >    (insn 347 346 0 (set (subreg:DI (reg:V4x4BF 185) 24)
> >          (subreg:DI (reg:V4SI 184) 8)) 110 {*movdi_aarch64}
> >       (nil))
> > ```
> > Here, str_rtx is v28, which is copied into 185. So, 185 is updated
> instead.
> > The next instruction after the sequence generated by store_bit_field is:
> > ```
> > (insn 45 44 309 2 (set (reg:V4x4BF 60 v28)
> >          (unspec:V4x4BF [
> >                  (mem:BLK (reg/f:DI 183 [ overwrite ]) [0  S8 A8])
> >                  (reg:V4x4BF 60 v28)
> >                  (const_int 3 [0x3])
> >              ] UNSPEC_LD4_LANE)) "./gcc/include/arm_neon.h":28361:10
> > 4597 {aarch64_vec_load_lanesv4x4bf_lanev4bf}
> >       (nil))
> > ```
> > This one uses v28, which holds the old value.
> Yea, this sounds like something in that path not honoring the "result is
> generated in TARGET, if convenient to do so" semantics that many of
> those expansion routines have.
>
> For those kinds of scenarios, something should be testing that the
> result returned by the expander function (or whatever function is in
> question) is a different register than was passed in as the desired
> destination.  And in that case it should emit an insn to copy the value
> from the return value into the actual target value.
>
> As an example of these semantics, see do_store_flag.  The net of having
> these semantics for an API is that every caller has to check and emit
> that extra insn.  Also note this sometimes expands outwards.  Continuing
> with do_store_flag, expand_expr_real_2 calls it in such a way that
> expand_expr_real_2 effectively has the same semantics as does
> expand_expr_real_1, and so-on.
>
> To be clear, the semantics are *awful*, but that's how things have
> worked forever.
>
> So I think the way forward is to trace through store_bit_field and its
> children.  At some point something is returning (reg 185) as a result
> and the caller doesn't recognize that it needs to copy (reg 185) into
> (reg 60).  So if possible you might be able to key a conditional
> breakpoint on generating (reg 185) and something in the call stack when
> that happens is probably the culprit.
>
> Essentially I suspect you're working around a latent bug somewhere.
>
The new register is generated inside `store_bit_field_1` by `gen_lowpart`.
We have sent a new version (
https://gcc.gnu.org/pipermail/gcc-patches/2025-December/703268.html).

Thanks,
Konstantinos

>
> jeff
>

Reply via email to