Hi Jeff, On Fri, Nov 28, 2025 at 8:26 PM Jeff Law <[email protected]> wrote:
> > > On 11/27/25 1:56 AM, Konstantinos Eleftheriou wrote: > > > > > > On Thu, Nov 27, 2025 at 2:15 AM Jeff Law <[email protected] > > <mailto:[email protected]>> wrote: > > > > > > > > On 11/19/25 7:37 AM, Konstantinos Eleftheriou wrote: > > > Sometimes, `store_bit_field` copies the destination register into > > a new one, > > > which leads to the old register being used in the instructions > > that follow > > > the ones generated by `store_bit_field`, while the bit field > > insertion is > > > performed on the new register. > > > > > > This patch copies back the new destination register into the old > > one when > > > needed. > > > > > > gcc/ChangeLog: > > > > > > * avoid-store-forwarding.cc > (generate_bit_insert_sequence): > > > Copy back the new destination register into the old one > > when needed. > > This sounds more like a bit in store_bit_field. It's stated purpose > is > > to store the source field into an object. I'd really like to > > understand > > in detail the scenario in which str_rtx does not contain the right > > value > > after a call to store_bit_field. > > > > > > We were getting regressions in these testcases w/o this patch: > > # * gcc: gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_1.c -O3 - > > fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline- > > functions execution test > > # * gcc: gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_1.c -O3 -g > > execution test > > # * gcc: gcc.target/aarch64/vldN_lane_1.c execution test > > > > For bf16_vldN_lane_1.c for example, this ASF transformation is triggered: > > ``` > > Store forwarding detected: > > From: (insn 35 64 36 2 (set (mem/c:V4SI (plus:DI (reg/f:DI 64 sfp) > > (const_int -16 [0xfffffffffffffff0])) [0 MEM > > <char[1:32]> [(void *)&vectors]+16 S16 A128]) > > (reg:V4SI 143 [ MEM <char[1:32]> [(void *)data_22(D)]+16 ])) > > "../testcase.c":21:20 discrim 1 1333 {*aarch64_simd_movv4si} > > (expr_list:REG_DEAD (reg:V4SI 143 [ MEM <char[1:32]> [(void > > *)data_22(D)]+16 ]) > > (nil))) > > To: (insn 41 40 42 2 (set (reg:V4x4BF 60 v28) > > (mem/c:V4x4BF (plus:DI (reg/f:DI 64 sfp) > > (const_int -32 [0xffffffffffffffe0])) [2 vectors+0 S32 > > A128])) 4838 {*aarch64_movv4x4bf} > > (nil)) > > Store forwarding avoided with bit inserts: > > With sequence: > > (insn 345 0 346 (set (reg:V4x4BF 185) > > (reg:V4x4BF 60 v28)) 4838 {*aarch64_movv4x4bf} > > (nil)) > > (insn 346 345 347 (set (subreg:DI (reg:V4x4BF 185) 16) > > (subreg:DI (reg:V4SI 184) 0)) 110 {*movdi_aarch64} > > (nil)) > > (insn 347 346 0 (set (subreg:DI (reg:V4x4BF 185) 24) > > (subreg:DI (reg:V4SI 184) 8)) 110 {*movdi_aarch64} > > (nil)) > > ``` > > Here, str_rtx is v28, which is copied into 185. So, 185 is updated > instead. > > The next instruction after the sequence generated by store_bit_field is: > > ``` > > (insn 45 44 309 2 (set (reg:V4x4BF 60 v28) > > (unspec:V4x4BF [ > > (mem:BLK (reg/f:DI 183 [ overwrite ]) [0 S8 A8]) > > (reg:V4x4BF 60 v28) > > (const_int 3 [0x3]) > > ] UNSPEC_LD4_LANE)) "./gcc/include/arm_neon.h":28361:10 > > 4597 {aarch64_vec_load_lanesv4x4bf_lanev4bf} > > (nil)) > > ``` > > This one uses v28, which holds the old value. > Yea, this sounds like something in that path not honoring the "result is > generated in TARGET, if convenient to do so" semantics that many of > those expansion routines have. > > For those kinds of scenarios, something should be testing that the > result returned by the expander function (or whatever function is in > question) is a different register than was passed in as the desired > destination. And in that case it should emit an insn to copy the value > from the return value into the actual target value. > > As an example of these semantics, see do_store_flag. The net of having > these semantics for an API is that every caller has to check and emit > that extra insn. Also note this sometimes expands outwards. Continuing > with do_store_flag, expand_expr_real_2 calls it in such a way that > expand_expr_real_2 effectively has the same semantics as does > expand_expr_real_1, and so-on. > > To be clear, the semantics are *awful*, but that's how things have > worked forever. > > So I think the way forward is to trace through store_bit_field and its > children. At some point something is returning (reg 185) as a result > and the caller doesn't recognize that it needs to copy (reg 185) into > (reg 60). So if possible you might be able to key a conditional > breakpoint on generating (reg 185) and something in the call stack when > that happens is probably the culprit. > > Essentially I suspect you're working around a latent bug somewhere. > The new register is generated inside `store_bit_field_1` by `gen_lowpart`. We have sent a new version ( https://gcc.gnu.org/pipermail/gcc-patches/2025-December/703268.html). Thanks, Konstantinos > > jeff >
