On Thu, Jul 12, 2018 at 01:52:49PM +0200, Andrea Parri wrote: > On Thu, Jul 12, 2018 at 09:40:40AM +0200, Peter Zijlstra wrote: > > On Wed, Jul 11, 2018 at 02:34:21PM +0200, Andrea Parri wrote:
> > > 2) Resolve the above mentioned controversy (the inconsistency between > > > - locking operations and atomic RMWs on one side, and their actual > > > implementation in generic code on the other), thus enabling the use > > > of LKMM _and_ its tools for the analysis/reviewing of the latter. > > > > This is a good point; so lets see if there is something we can do to > > strengthen the model so it all works again. > > > > And I think if we raise atomic*_acquire() to require TSO (but ideally > > raise it to RCsc) we're there. > > > You mean: "when paired with a po-earlier release to the same memory > location", right? I am afraid that neither arm64 nor riscv current > implementations would ensure "(r1 == 1 && r2 == 0) forbidden" if we > removed the po-earlier spin_unlock()... Yes indeed. More on this below. > But again, these are stuble patterns, and my guess is that several/ > most kernel developers really won't care about such guarantees (and > if some will do, they'll have the tools to figure out what they can > actually rely on ...) Yes it is subtle, yes most people won't care, however the problem is that it is subtly the wrong way around. People expect causality, this is a human failing perhaps, but that's how it is. And I strongly feel we should have our locks be such that they don't subtly break things. Take for instance the pattern where RCU relies on RCsc locks, this is an entirely simple and straight forward use of locks, yet completely fails on this subtle point. And people will not even try and use complicated tools for apparently simple things. They'll say, oh of course this simple thing will work right. I'm still hoping we can convince the PowerPC people that they're wrong, and get rid of this wart and just call all locks RCsc. > OTOH (as I pointed out earlier) the strengthening we're configuring > will prevent some arch. (riscv being just the example of today!) to > go "full RCpc", and this will inevitably "complicate" both the LKMM > and the reviewing process of related changes (atomics, locking, ...; > c.f., this debate), apparently, just because you ;-) want to "care" > about these guarantees. It's not just me btw, Linus also cares about these matters. Widely used primitives such as spinlocks, should not have subtle and counter-intuitive behaviour such as RCpc. Anyway, back to the problem of being able to use the memory model to describe locks. This is I think a useful property. My earlier reasoning was that: - smp_store_release() + smp_load_acquire() := RCpc - we use smp_store_release() as unlock() Therefore, if we want unlock+lock to imply at least TSO (ideally smp_mb()) we need lock to make up for whatever unlock lacks. Hence my proposal to strenghten rmw-acquire, because that is the basic primitive used to implement lock. But as you (and Will) point out, we don't so much care about rmw-acquire semantics as much as that we care about unlock+lock behaviour. Another way to look at this is to define: smp-store-release + rmw-acquire := TSO (ideally smp_mb) But then we also have to look at: rmw-release + smp-load-acquire rmw-release + rmw-acquire for completeness sake, and I would suggest they result in (at least) the same (TSO) ordering as the one we really care about. One alternative is to no longer use smp_store_release() for unlock(), and say define atomic_set_release() to be in the rmw-release class instead of being a simple smp_store_release(). Another, and I like this proposal least, is to introduce a new barrier to make this all work.