On Sat, Jul 25, 2020 at 10:10:13PM +0200, [email protected] wrote:
> On Sat, Jul 25, 2020 at 12:39:09PM -0700, Paul E. McKenney wrote:

> > This gets me the following for __rcu_read_lock():
> > 
> > 00000000000000e0 <__rcu_read_lock>:
> >       e0:   48 8b 14 25 00 00 00    mov    0x0,%rdx
> >       e7:   00 
> >       e8:   8b 82 e0 02 00 00       mov    0x2e0(%rdx),%eax
> >       ee:   83 c0 01                add    $0x1,%eax
> >       f1:   89 82 e0 02 00 00       mov    %eax,0x2e0(%rdx)
> >       f7:   c3                      retq   
> >       f8:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
> >       ff:   00 
> > 
> > One might hope for a dec instruction, but this isn't bad.  We do lose
> > a few instructions compared to the C-language case due to differences
> > in address calculation:
> > 
> > 00000000000000e0 <__rcu_read_lock>:
> >       e0:   48 8b 04 25 00 00 00    mov    0x0,%rax
> >       e7:   00 
> >       e8:   83 80 e0 02 00 00 01    addl   $0x1,0x2e0(%rax)
> >       ef:   c3                      retq   
> 
> Shees, that's daft... I think this is one of the cases where GCC is
> perhaps overly cautious when presented with 'volatile'.
> 
> It has a history of generating excessively crap code around volatile,
> and while it has improved somewhat, this seems to show there's still
> room for improvement...
> 
> I suppose this is the point where we go bug a friendly compiler person.

Having had a play with godbolt.org, it seems clang isn't affected by
this particular flavour of crazy, but GCC does indeed refuse to fuse the
address calculation and the addition.

Reply via email to