On Wed, Nov 26, 2025 at 8:00 PM Uros Bizjak <[email protected]> wrote:
>
> On Wed, Nov 26, 2025 at 12:43 PM H.J. Lu <[email protected]> wrote:
> >
> > On Wed, Nov 26, 2025 at 5:53 PM Uros Bizjak <[email protected]> wrote:
> > >
> > > On Wed, Nov 26, 2025 at 10:18 AM H.J. Lu <[email protected]> wrote:
> > > >
> > > > On Wed, Nov 26, 2025 at 3:52 PM Uros Bizjak <[email protected]> wrote:
> > > > >
> > > > > On Wed, Nov 26, 2025 at 8:26 AM H.J. Lu <[email protected]> wrote:
> > > > >
> > > > > > > > The reasons for try_combine_insn are that
> > > > > > > >
> > > > > > > > 1. When general_operand is called, instruction info isn't 
> > > > > > > > available.
> > > > > > > > 2. Only recog_for_combine_1 calls init_recog_no_volatile and
> > > > > > > > tries memory operands.
> > > > > > > But if we make stores work sensibly, then we no longer need the
> > > > > > > instruction, right?  It's the mere need to track the instruction 
> > > > > > > that
> > > > > > > seems rather hokey/hackish to me right now.
> > > > > > >
> > > > > > > jeff
> > > > > >
> > > > > > For
> > > > > >
> > > > > > volatile unsigned char u8;
> > > > > >
> > > > > > void test (void)
> > > > > > {
> > > > > >   u8 = u8 + u8;
> > > > > >   u8 = u8 - u8;
> > > > > > }
> > > > > >
> > > > > > When volatile store is allowed,  we generate
> > > > > >
> > > > > > (insn 8 7 9 2 (parallel [
> > > > > >             (set (mem/v/c:QI (symbol_ref:DI ("u8") [flags 0x2]
> > > > > > <var_decl 0x7fe9719d6e40 u8>) [0 u8+0 S1 A8])
> > > > > >                 (ashift:QI (mem/v/c:QI (symbol_ref:DI ("u8") [flags
> > > > > > 0x2]  <var_decl 0x7fe9719d6e40 u8>) [0 u8+0 S1 A8])
> > > > > >                     (const_int 1 [0x1])))
> > > > > >             (clobber (reg:CC 17 flags))
> > > > > >         ]) 
> > > > > > "/export/gnu/import/git/gitlab/x86-gcc-test/gcc/testsuite/gcc.dg/pr86617.c":7:6
> > > > > > 1139 {*ashlqi3_1}
> > > > > >      (expr_list:REG_UNUSED (reg:CC 17 flags)
> > > > > >         (nil)))
> > > > > >
> > > > > > on x86 which leads to
> > > > > >
> > > > > > salb u8(%rip)
> > > > > >
> > > > > > instead of 2 loads.  Without the instruction, we don't know if a
> > > > > > memory reference
> > > > > > should be allowed for stores.
> > > > >
> > > > > combine pass doesn't handle volatiles correctly in all cases, this is
> > > > > the reason I think this propagation should be done in late-combine. We
> > > > > are interested only in propagations of memory loads/stores into
> > > > > instructions, and late-combine does exactly that.
> > > > >
> > > > > Uros.
> > > >
> > > > late combine doesn't try to combine memory references at all.
> > >
> > > Try PR122343 testcase with the attached patch (gcc -O2):
> > >
> > > _.308r.late_combine1:
> > >
> > > trying to combine definition of r98 in:
> > >     6: r98:SI=[`bar']
> > > into:
> > >     8: {r102:SI=r103:SI+r98:SI;clobber flags:CC;}
> > >       REG_DEAD r103:SI
> > >       REG_DEAD r98:SI
> > >       REG_UNUSED flags:CC
> > > successfully matched this instruction to *addsi_1:
> > > (parallel [
> > >         (set (reg:SI 102 [ _5 ])
> > >             (plus:SI (reg:SI 103 [ z_3 ])
> > >                 (mem/v/c:SI (symbol_ref:DI ("bar") [flags 0x40]
> > > <var_decl 0x7f7810bd6e40 bar>) [1 bar+0 S4 A32])))
> > >         (clobber (reg:CC 17 flags))
> > >     ])
> > > original cost = 5 + 4 (weighted: 9.000000), replacement cost = 9
> > > (weighted: 9.000000); keeping replacement
> > > rescanning insn with uid = 8.
> > > updating insn 8 in-place
> > > verify found no changes in insn with uid = 8.
> > > deleting insn 6
> > > deleting insn with uid = 6.
> > >
> > > foo:
> > >        imull   $123, %edi, %eax
> > >        addl    bar(%rip), %eax
> > >        ret
> > >
> > > Uros.
> >
> > It doesn't work on:
> >
> > extern volatile int bar;
> >
> > void
> > foo (void)
> > {
> >   bar += 123;
> > }
> >
> > Your patch generates:
> >
> > movl bar(%rip), %eax
> > addl $123, %eax
> > movl %eax, bar(%rip)
>
> This is expected, late-combine pass won't create RMW instructions.
>
> Uros.

I am testing this patch.   For

extern volatile unsigned char u8;

void
test (void)
{
  u8 = u8 + u8;
  u8 = u8 - u8;
}

it generates:

movzbl u8(%rip), %eax
addb %al, u8(%rip)
movzbl u8(%rip), %eax
subb u8(%rip), %al
movb %al, u8(%rip)
ret

-- 
H.J.
diff --git a/gcc/combine.cc b/gcc/combine.cc
index 66963222efc..62361a74c32 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -5781,7 +5781,13 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, bool in_dest, bool in_cond)
       break;
     case RTX_COMM_ARITH:
     case RTX_BIN_ARITH:
-      temp = simplify_binary_operation (code, mode, XEXP (x, 0), XEXP (x, 1));
+      /* Don't combine 2 volatile memory references.  */
+      if (!(MEM_P (XEXP (x, 0))
+	    && MEM_VOLATILE_P (XEXP (x, 0))
+	    && MEM_P (XEXP (x, 1))
+	    && MEM_VOLATILE_P (XEXP (x, 1))))
+	temp = simplify_binary_operation (code, mode, XEXP (x, 0),
+					  XEXP (x, 1));
       break;
     case RTX_BITFIELD_OPS:
     case RTX_TERNARY:
diff --git a/gcc/common.opt b/gcc/common.opt
index d44f713ae34..0c78b5199c6 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1828,6 +1828,10 @@ ffunction-sections
 Common Var(flag_function_sections)
 Place each function into its own section.
 
+ffuse-ops-with-volatile-access
+Target Var(flag_fuse_ops_with_volatile_access) Init(1)
+Allow operations with volatile memory access in the combine pass.
+
 fgcse
 Common Var(flag_gcse) Optimization
 Perform global common subexpression elimination.
diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index d13af0a8e7c..870d2efdbed 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -782,6 +782,9 @@ UrlSuffix(gcc/Optimize-Options.html#index-ffunction-cse)
 ffunction-sections
 UrlSuffix(gcc/Optimize-Options.html#index-ffunction-sections)
 
+ffuse-ops-with-volatile-access
+UrlSuffix(gcc/Optimize-Options.html#index-ffuse-ops-with-volatile-access)
+
 fgcse
 UrlSuffix(gcc/Optimize-Options.html#index-fgcse)
 
diff --git a/gcc/recog.cc b/gcc/recog.cc
index 67d7fa63069..bfa10df9e4b 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -1596,7 +1596,11 @@ general_operand (rtx op, machine_mode mode)
     {
       rtx y = XEXP (op, 0);
 
-      if (! volatile_ok && MEM_VOLATILE_P (op))
+      /* If -ffuse-ops-with-volatile-access is enabled, allow volatile
+	 memory reference.  */
+      if (!flag_fuse_ops_with_volatile_access
+	  && !volatile_ok
+	  && MEM_VOLATILE_P (op))
 	return false;
 
       /* Use the mem's mode, since it will be reloaded thus.  LRA can

Reply via email to