On Wed, Nov 26, 2025 at 10:53 AM Uros Bizjak <[email protected]> wrote:

> > > > For
> > > >
> > > > volatile unsigned char u8;
> > > >
> > > > void test (void)
> > > > {
> > > >   u8 = u8 + u8;
> > > >   u8 = u8 - u8;
> > > > }
> > > >
> > > > When volatile store is allowed,  we generate
> > > >
> > > > (insn 8 7 9 2 (parallel [
> > > >             (set (mem/v/c:QI (symbol_ref:DI ("u8") [flags 0x2]
> > > > <var_decl 0x7fe9719d6e40 u8>) [0 u8+0 S1 A8])
> > > >                 (ashift:QI (mem/v/c:QI (symbol_ref:DI ("u8") [flags
> > > > 0x2]  <var_decl 0x7fe9719d6e40 u8>) [0 u8+0 S1 A8])
> > > >                     (const_int 1 [0x1])))
> > > >             (clobber (reg:CC 17 flags))
> > > >         ]) 
> > > > "/export/gnu/import/git/gitlab/x86-gcc-test/gcc/testsuite/gcc.dg/pr86617.c":7:6
> > > > 1139 {*ashlqi3_1}
> > > >      (expr_list:REG_UNUSED (reg:CC 17 flags)
> > > >         (nil)))
> > > >
> > > > on x86 which leads to
> > > >
> > > > salb u8(%rip)
> > > >
> > > > instead of 2 loads.  Without the instruction, we don't know if a
> > > > memory reference
> > > > should be allowed for stores.
> > >
> > > combine pass doesn't handle volatiles correctly in all cases, this is
> > > the reason I think this propagation should be done in late-combine. We
> > > are interested only in propagations of memory loads/stores into
> > > instructions, and late-combine does exactly that.
> > >
> > > Uros.
> >
> > late combine doesn't try to combine memory references at all.

So, by enabling propagation of volatile defs in late-combine pass,
your above testcase compiles to

       movzbl  u8(%rip), %eax
       addb    u8(%rip), %al
       movb    %al, u8(%rip)
       movzbl  u8(%rip), %eax
       subb    u8(%rip), %al
       movb    %al, u8(%rip)
       ret

which is better than the current:

       movzbl  u8(%rip), %eax
       movzbl  u8(%rip), %edx
       addl    %edx, %eax
       movb    %al, u8(%rip)
       movzbl  u8(%rip), %eax
       movzbl  u8(%rip), %edx
       subl    %edx, %eax
       movb    %al, u8(%rip)
       ret

but still worse than clang's:

       movzbl  u8(%rip), %eax
       addb    %al, u8(%rip)
       movzbl  u8(%rip), %eax
       subb    u8(%rip), %al
       movb    %al, u8(%rip)
       retq

(clang is able to propagate *to* volatile output and form RMW insn).

By improving late-combine pass to allow volatile defs (and checking
that volatile def gets propagated to *exactly one* place), we can
already substantially improve generated code for e.g. linux kernel.
There are many volatile reads that can be combined with a follow-up
use in the kernel, but RMW insns are relatively rarely used.

Uros.

Reply via email to