Re: [RFC] arm64: Enforce observed order for spinlock and data

bdegraaf Thu, 13 Oct 2016 13:02:56 -0700

On 2016-10-13 07:02, Will Deacon wrote:

Brent,

On Wed, Oct 12, 2016 at 04:01:06PM -0400, [email protected]wrote:


Everything from this point down needs clarification.

All arm64 lockref accesses that occur without taking the spinlock must
behave like true atomics, ensuring successive operations are all done
sequentially.


What is a "true atomic"? What do you mean by "successive"? What do you
mean by "done sequentially"?

Despite the use case in dentry, lockref itself is a generic locking API,andany problems I describe here are with the generic API itself, notnecessarily

the dentry use case.  I'm not patching dentry--I'm fixing lockref.

By necessity, the API must do its update atomically, as keeping thingscorrectinvolves potential spinlock access by other agents which may opt to usethespinlock API or the lockref API at their discretion. With the currentarm64spinlock implementation, it is possible for the lockref to observe thechangedcontents of the protected count without observing the spinlock beinglocked,

which could lead to missed changes to the lock_count itself, because any
calculations made on it could be overwritten or completed in a different
sequence.

(Spinlock locked access is obtained with a simple store under certain

scenarios. My attempt to fix this in the spinlock code was met withresistancesaying it should be addressed in lockref, since that is the API thatwouldencounter the issue. These changes were initiated in response to thatrequest.Additional ordering problems were uncovered when I looked into lockrefitself.)

The example below involves only a single agent exactly as you explaintheproblem in commit 8e86f0b409a44193f1587e87b69c5dcf8f65be67. Even for asingleexecution agent, this means that the code below could access out oforder.As lockref is a generic API, it doesn't matter whether dentry does thisor not.

By "done sequentially," I mean "accessed in program order."

As far as "true atomic" goes, I am referring to an atomic in the samesense you

did in commit 8e86f0b409a44193f1587e87b69c5dcf8f65be67.

The guarantee provided by lockref is that, if you hold the spinlock,then

you don't need to use atomics to inspect the reference count, as it is
guaranteed to be stable. You can't just go around replacing spin_lock
calls with lockref_get -- that's not what this is about.

I am not sure where you got the idea that I was referring to replacingspinlocks

with lockref calls.  That is not the foundation for this fix.

Currently
the lockref accesses, when decompiled, look like the followingsequence:
                    <Lockref "unlocked" Access [A]>

                    // Lockref "unlocked" (B)
                1:  ldxr   x0, [B]         // Exclusive load
                     <change lock_count B>
                    stxr   w1, x0, [B]
                    cbnz   w1, 1b

                     <Lockref "unlocked" Access [C]>
Even though access to the lock_count is protected by exclusives, thisis not
enough
to guarantee order: The lock_count must change atomically, in order,so the
only
permitted ordering would be:
                              A -> B -> C


Says who? Please point me at a piece of code that relies on this. I'm

willing to believe that are bugs in this area, but waving your handsaround

and saying certain properties "must" hold is not helpful unless you can
say *why* they must hold and *where* that is required.

The lockref code must access in order, because other agents can observeit viaspinlock OR lockref APIs. Again, this is a generic API, not an explicitpart ofdentry. Other code will use it, and the manner in which it is used indentry is notrelevant. What lock_count is changed to is not proscribed by thelockrefAPI. There is no guarantee whether it be an add, subtract, multiply,divide, setto some explicit value, etc. But the changes must be done in programorder andobservable in that same order by other agents: Therefore, the spinlockand lock_countmust be accessed atomically, and observed to change atomically at thesystem level.

I am not off base saying lockref is an atomic access. Here are somereferences:

Under Documentation/filesystems/path-lookup.md, the dentry->d_lockrefmechanism

is described as an atomic access.

At the time lockref was introduced, The Linux Foundation gave apresentation at

LinuxCon 2014 that can be found at the following link:

https://events.linuxfoundation.org/sites/events/files/slides/linuxcon-2014-locking-final.pdf

On page 46, it outlines the lockref API. The first lines of the slidegive the

relevant details.

Lockref
• *Generic* mechanism to *atomically* update a reference count that is

protected by a spinlock without actually acquiring the spinlockitself.

While dentry's use is mentioned, this API is not restricted to the usecase of dentry.

Unfortunately, this is not the case by the letter of the architectureand,
in fact,
the accesses to A and C are not protected by any sort of barrier, andhence
are
permitted to reorder freely, resulting in orderings such as

                           Bl -> A -> C -> Bs
Again, why is this a problem? It's exactly the same as if you did:

        spin_lock(lock);
        inc_ref_cnt();
        spin_unlock(lock);
Accesses outside of the critical section can still be reordered. Bigdeal.

Since the current code resembles but actually has *fewer* orderingeffectscompared to the example used by your atomic.h commit, even thoughA->B->C is inprogram order, it could access out of order according to your own committext

on commit 8e86f0b409a44193f1587e87b69c5dcf8f65be67.

Taking spin_lock/spin_unlock, however, includes ordering by nature oftheload-acquire observing the store-release of a prior unlock, so orderingisenforced with the spinlock version of accesses. The lockref itself has*no*ordering enforced, unless a locked state is encountered and it fallsbackto the spinlock code. So this is a fundamental difference betweenlockref and

spinlock.  So, no, lockref ordering is currently not exactly the same as
spinlock--but it should be.

In this specific scenario, since "change lock_count" could be an
increment, a decrement or even a set to a specific value, there couldbe
trouble.
What trouble?

Take, for example, a use case where the ref count is either positive orzero.If increments and decrements hit out of order, a decrement that wassupposedto come after an increment would instead do nothing if the value of thelockstarted at zero. Then when the increment hit later, the ref count wouldremainpositive with a net effect of +1 to the ref count instead of +1-1=0.Again,

however, the lockref does not specify how the contents of lock_count are

manipulated, it was only meant to guarantee that they are doneatomically when

the lock is not held.

With more agents accessing the lockref without taking the lock, even
scenarios where the cmpxchg passes falsely can be encountered, asthere isno guarantee that the the "old" value will not match exactly a newervaluedue to out-of-order access by a combination of agents that incrementand
decrement the lock_count by the same amount.
This is the A-B-A problem, but I don't see why it affects us here.We're
dealing with a single reference count.


If lockref accesses were to occur on many Pe's, there are all sorts of

things that could happen in terms of who wins what, and what they setthelock_count to. My point is simply that each access should be atomicbecauselockref is a generic API and was intended to be a lockless atomicaccess.Leaving this problem open until someone else introduces a use thatexposes

it, which could happen in the main kernel code, is probably not a good
idea, as it could prove difficult to track down.

Since multiple agents are accessing this without locking the spinlock,
this access must have the same protections in place as atomics do inthe
arch's atomic.h.


Why? I don't think that it does. Have a look at how lockref is used by
the dcache code: it's really about keeping a reference to a dentry,
which may be in the process of being unhashed and removed. The
interaction with concurrent updaters to the dentry itself is handled
using a seqlock, which does have the necessary barriers. Yes, the code
is extremely complicated, but given that you're reporting issues based

on code inspection, then you'll need to understand what you'rechanging.

Again, this is a generic API, not an API married to dentry. If it werefordentry's sole use, it should not be accessible outside of the dentrycode.While the cmpxchg64_relaxed case may be OK for dentry, it is not OK forthe

generic case.

Fortunately, the fix is not complicated: merely removing the errant
_relaxed option on the cmpxchg64 is enough to introduce exactly thesamecode sequence justified in commit8e86f0b409a44193f1587e87b69c5dcf8f65be67
to fix arm64 atomics.
I introduced cmpxchg64_relaxed precisely for the lockref case. I still
don't see a compelling reason to strengthen it. If you think there's abug,please spend the effort to describe how it manifests and what canactuallygo wrong in the existing codebase. Your previous patches fixingso-calledbugs found by inspection have both turned out to be bogus, so I'msorry,
but I'm not exactly leaping on your contributions to this.

Will

I have detailed the problems here, and they are with the generic case,no

hand waving required.

On a further note, it is not accurate to say that my prior patches were

bogus: One called to attention a yet-to-be-corrected problem in theARMv8

Programmer's Guide, and the other was sidestepped by a refactor that
addressed the problem I set out to fix with a control flow change. Since
that problem was the fundamental reason I had worked on the gettime code
in the first place, I abandoned my effort. The refactor that fixed the

control-flow problem, however, is still missing on v4.7 and earlierkernels

(sequence lock logic should be verified prior to the isb that demarcates
the virtual counter register read). I have confirmed this is an issue on
various armv8 hardware, sometimes obtaining identical register values

between multiple reads that were delayed such that they should haveshown

changes, evidence that the register read accessed prior to the seqlock
update having finished (the control flow problem).

Brent

Re: [RFC] arm64: Enforce observed order for spinlock and data

Reply via email to