Hi Thomas,
we are still waiting for the comments from Honnappa. In our understanding, the
missing barrier is a bug according to the model. We reproduced the scenario in
herd7, which represents the authoritative memory model:
https://developer.arm.com/architectures/cpu-architecture/a-profile/memory-model-tool
Here is a litmus code that shows that the XCHG (when compiled to LDAXR and
STLR) is not atomic wrt memory updates to other locations:
-----
AArch64 XCHG-nonatomic
{
0:X1=locked; 0:X3=next;
1:X1=locked; 1:X3=next; 1:X5=tail;
}
P0 | P1;
LDR W0, [X3] | MOV W0, #1;
CBZ W0, end | STR W0, [X1]; (* init locked *)
MOV W2, #2 | MOV W2, #0;
STR W2, [X1] | xchg:;
end: | LDAXR W6, [X5];
NOP | STLXR W4, W0, [X5];
NOP | CBNZ W4, xchg;
NOP | STR W0, [X3]; (* set next *)
exists
(0:X2=2 /\ locked=1)
-----
(web version of herd7: http://diy.inria.fr/www/?record=aarch64)
P1 is trying to acquire the lock:
- initializes locked
- does the xchg on the tail of the mcslock
- sets the next
P0 is releasing the lock:
- if next is not set, just terminates
- if next is set, stores 2 in locked
The initialization of locked should never overwrite the store 2 to locked, but
it does.
To avoid that reordering to happen, one should make the last store of P1 to
have a "release" barrier, ie, STLR.
This is equivalent to the reordering occurring in the mcslock of librte_eal.
Best regards,
-Diogo
-----Original Message-----
From: Thomas Monjalon [mailto:[email protected]]
Sent: Tuesday, October 6, 2020 11:50 PM
To: Phil Yang <[email protected]>; Diogo Behrens <[email protected]>;
Honnappa Nagarahalli <[email protected]>
Cc: [email protected]; nd <[email protected]>
Subject: Re: [dpdk-dev] [PATCH] librte_eal: fix mcslock hang on weak memory
31/08/2020 20:45, Honnappa Nagarahalli:
>
> Hi Diogo,
>
> Thanks for your explanation.
>
> As documented in https://developer.arm.com/documentation/ddi0487/fc B2.9.5
> Load-Exclusive and Store-Exclusive instruction usage restrictions:
> " Between the Load-Exclusive and the Store-Exclusive, there are no
> explicit memory accesses, preloads, direct or indirect System register
> writes, address translation instructions, cache or TLB maintenance
> instructions, exception generating instructions, exception returns, or
> indirect branches."
> [Honnappa] This is a requirement on the software, not on the
> micro-architecture.
> We are having few discussions internally, will get back soon.
>
> So it is not allowed to insert (1) & (4) between (2, 3). The cmpxchg
> operation is atomic.
Please what is the conclusion?