>
>
> > The function __rte_ring_headtail_move_head() assumes that the barrier
> > (fence) between the load of the head and the load-acquire of the
> > opposing tail guarantees the following: if a first thread reads tail
> > and then writes head and a second thread reads the new value of head
> > and then reads tail, then it should observe the same (or a later)
> > value of tail.
> >
> > This assumption is incorrect under the C11 memory model. If the barrier
> > (fence) is intended to establish a total ordering of ring operations,
> > it fails to do so. Instead, the current implementation only enforces a
> > partial ordering, which can lead to unsafe interleavings. In particular,
> > some partial orders can cause underflows in free slot or available
> > element computations, potentially resulting in data corruption.
>
> Hmm... sounds exactly like the problem from the patch we discussed earlier
> that
> year:
> https://patchwork.dpdk.org/project/dpdk/patch/20250521111432.207936-4-
> konstantin.anan...@huawei.com/
> In two words:
> "... thread can see 'latest' 'cons.head' value, with 'previous' value for
> 'prod.tail' or
> visa-versa.
> In other words: 'cons.head' value depends on 'prod.tail', so before making
> latest
> 'cons.head'
> value visible to other threads, we need to ensure that latest 'prod.tail' is
> also visible."
> Is that the one?
>
> > The issue manifests when a CPU first acts as a producer and later as a
> > consumer. In this scenario, the barrier assumption may fail when another
> > core takes the consumer role. A Herd7 litmus test in C11 can demonstrate
> > this violation. The problem has not been widely observed so far because:
> > (a) on strong memory models (e.g., x86-64) the assumption holds, and
> > (b) on relaxed models with RCsc semantics the ordering is still strong
> > enough to prevent hazards.
> > The problem becomes visible only on weaker models, when load-acquire is
> > implemented with RCpc semantics (e.g. some AArch64 CPUs which support
> > the LDAPR and LDAPUR instructions).
> >
> > Three possible solutions exist:
> > 1. Strengthen ordering by upgrading release/acquire semantics to
> > sequential consistency. This requires using seq-cst for stores,
> > loads, and CAS operations. However, this approach introduces a
> > significant performance penalty on relaxed-memory architectures.
> >
> > 2. Establish a safe partial order by enforcing a pair-wise
> > happens-before relationship between thread of same role by changing
> > the CAS and the preceding load of the head by converting them to
> > release and acquire respectively. This approach makes the original
> > barrier assumption unnecessary and allows its removal.
>
> For the sake of clarity, can you outline what would be exact code changes for
> approach #2? Same as in that patch:
> https://patchwork.dpdk.org/project/dpdk/patch/20250521111432.207936-4-
> konstantin.anan...@huawei.com/
> Or something different?
>
> > 3. Retain partial ordering but ensure only safe partial orders are
> > committed. This can be done by detecting underflow conditions
> > (producer < consumer) and quashing the update in such cases.
> > This approach makes the original barrier assumption unnecessary
> > and allows its removal.
>
> > This patch implements solution (3) for performance reasons.
> >
> > Signed-off-by: Wathsala Vithanage <wathsala.vithan...@arm.com>
> > Signed-off-by: Ola Liljedahl <ola.liljed...@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> > Reviewed-by: Dhruv Tripathi <dhruv.tripa...@arm.com>
> > ---
> > lib/ring/rte_ring_c11_pvt.h | 10 +++++++---
> > 1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> > index b9388af0da..e5ac1f6b9e 100644
> > --- a/lib/ring/rte_ring_c11_pvt.h
> > +++ b/lib/ring/rte_ring_c11_pvt.h
> > @@ -83,9 +83,6 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail
> > *d,
> > /* Reset n to the initial burst count */
> > n = max;
> >
> > - /* Ensure the head is read before tail */
> > - rte_atomic_thread_fence(rte_memory_order_acquire);
> > -
> > /* load-acquire synchronize with store-release of ht->tail
> > * in update_tail.
> > */
>
> But then cons.head can be read a before prod.tail (and visa-versa), right?
s/before/after/
>
> > @@ -99,6 +96,13 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail
> > *d,
> > */
> > *entries = (capacity + stail - *old_head);
> >
> > + /*
> > + * Ensure the entries calculation was not based on a stale
> > + * and unsafe stail observation that causes underflow.
> > + */
> > + if ((int)*entries < 0)
> > + *entries = 0;
> > +
> > /* check that we have enough room in ring */
> > if (unlikely(n > *entries))
> > n = (behavior == RTE_RING_QUEUE_FIXED) ?
> > --
> > 2.43.0
> >