On Wed, Aug 21, 2019 at 11:32:01AM +0100, Will Deacon wrote:
> On Tue, Aug 20, 2019 at 01:29:32PM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 20, 2019 at 03:56:12PM +0200, Peter Zijlstra wrote:
> > > On Sat, Aug 17, 2019 at 01:08:02AM -0700, Linus Torvalds wrote:
> > > 
> > > > The data tearing issue is almost a non-issue. We're not going to add
> > > > WRITE_ONCE() to these kinds of places for no good reason.
> > > 
> > > Paulmck actually has an example of that somewhere; ISTR that particular
> > > case actually got fixed by GCC, but I'd really _love_ for some compiler
> > > people (both GCC and LLVM) to state that their respective compilers will
> > > not do load/store tearing for machine word sized load/stores.
> > 
> > I do very much recall such an example, but I am now unable to either
> > find it or reproduce it.  :-/
> > 
> > If I cannot turn it up in a few days, I will ask the LWN editors to
> > make appropriate changes to the "Who is afraid" article.
> > 
> > > Without this written guarantee (which supposedly was in older GCC
> > > manuals but has since gone missing), I'm loathe to rely on it.
> > > 
> > > Yes, it is very rare, but it is a massive royal pain to debug if/when it
> > > does do happen.
> > 
> > But from what I can see, Linus is OK with use of WRITE_ONCE() for data
> > races on any variable for which there is at least one READ_ONCE().
> > So we can still use WRITE_ONCE() as we would like in our own code.
> > Yes, you or I might be hit by someone else's omission of WRITE_ONCE(),
> > it is better than the proverbial kick in the teeth.
> > 
> > Of course, if anyone knows of a compiler/architecture combination that
> > really does tear stores of 32-bit constants, please do not keep it
> > a secret!  After all, it would be good to get that addressed easily
> > starting now rather than after a difficult and painful series of
> > debugging sessions.
> 
> It's not quite what you asked for, but if you look at the following
> silly code:
> 
> typedef unsigned long long u64;
> 
> struct data {
>       u64 arr[1023];
>       u64 flag;
> };
> 
> void foo(struct data *x)
> {
>       int i;
> 
>       for (i = 0; i < 1023; ++i)
>               x->arr[i] = 0;
> 
>       x->flag = 0;
> }
> 
> void bar(u64 *x)
> {
>       *x = 0xabcdef10abcdef10;
> }
> 
> Then arm64 clang (-O2) generates the following for foo:
> 
> foo:                                    // @foo
>       stp     x29, x30, [sp, #-16]!   // 16-byte Folded Spill
>       orr     w2, wzr, #0x2000
>       mov     w1, wzr
>       mov     x29, sp
>       bl      memset
>       ldp     x29, x30, [sp], #16     // 16-byte Folded Reload
>       ret
> 
> and so the store to 'flag' has become part of the memset, which could
> easily be bytewise in terms of atomicity (and this isn't unlikely given
> we have a DC ZVA instruction which only guaratees bytewise atomicity).
> 
> GCC (also -O2) generates the following for bar:
> 
> bar:
>       mov     w1, 61200
>       movk    w1, 0xabcd, lsl 16
>       stp     w1, w1, [x0]
>       ret
> 
> and so it is using a store-pair instruction to reduce the complexity in
> the immediate generation. Thus, the 64-bit store will only have 32-bit
> atomicity. In fact, this is scary because if I change bar to:
> 
> void bar(u64 *x)
> {
>       *(volatile u64 *)x = 0xabcdef10abcdef10;
> }
> 
> then I get:
> 
> bar:
>       mov     w1, 61200
>       movk    w1, 0xabcd, lsl 16
>       str     w1, [x0]
>       str     w1, [x0, 4]
>       ret
> 
> so I'm not sure that WRITE_ONCE would even help :/

Well, I can have the LWN article cite your email, then.  So thank you
very much!

Is generation of this code for a 64-bit volatile store considered a bug?
Or does ARMv8 exclude the possibility of 64-bit MMIO registers?  And I
would guess that Thomas and Linus would ask a similar bugginess question
for normal stores.  ;-)

> It's worth noting that:
> 
> void baz(atomic_long *x)
> {
>       atomic_store_explicit(x, 0xabcdef10abcdef10, memory_order_relaxed)
> }
> 
> does the right thing:
> 
> baz:
>       mov     x1, 61200
>       movk    x1, 0xabcd, lsl 16
>       movk    x1, 0xef10, lsl 32
>       movk    x1, 0xabcd, lsl 48
>       str     x1, [x0]
>       ret

OK, the C11 and C++11 guys should be happy with this.

> Whilst these examples may be contrived, I do thing they illustrate that
> we can't simply say "stores to aligned, word-sized pointers are atomic".

And thank you again!

                                                        Thanx, Paul

Reply via email to