> >
> > > > 1. rte_ring_generic_pvt.h:
> > > > =====================
> > > >
> > > > pseudo-c-code // related
> > > > armv8 instructions
> > > > --------------------
> > > > --------------------------------------
> > > > head.load() // ldr
> > > > [head]
> > > > rte_smp_rmb() // dmb ishld
> > > > opposite_tail.load() // ldr
> > > > [opposite_tail]
> > > > ...
> > > > rte_atomic32_cmpset(head, ...) // ldrex[head];...
> > > > stlex[head]
> > > >
> > > >
> > > > 2. rte_ring_c11_pvt.h
> > > > =====================
> > > >
> > > > pseudo-c-code // related
> > > > armv8 instructions
> > > > --------------------
> > > > --------------------------------------
> > > > head.atomic_load(relaxed) // ldr[head]
> > > > atomic_thread_fence(acquire) // dmb ish
> > > > opposite_tail.atomic_load(acquire) // lda[opposite_tail]
> > > > ...
> > > > head.atomic_cas(..., relaxed) // ldrex[haed]; ...
> > > > strex[head]
> > > >
> > > >
> > > > 3. rte_ring_hts_elem_pvt.h
> > > > ==========================
> > > >
> > > > pseudo-c-code // related
> > > > armv8 instructions
> > > > --------------------
> > > > --------------------------------------
> > > > head.atomic_load(acquire) // lda [head]
> > > > opposite_tail.load() // ldr
> > > > [opposite_tail]
> > > > ...
> > > > head.atomic_cas(..., acquire) // ldaex[head]; ...
> > > > strex[head]
> > > >
> > > > The questions that arose from these observations:
> > > > a) are all 3 approaches equivalent in terms of functionality?
> > > Different, lda (Load with acquire semantics) and ldr (load) are different.
> >
> > I understand that, my question was:
> > lda {head]; ldr[tail]
> > vs
> > ldr [head]; dmb ishld; ldr [tail];
> >
> > Is there any difference in terms of functionality (memory ops
> ordering/observability)?
>
> To be more precise:
>
> lda {head]; ldr[tail]
> vs
> ldr [head]; dmb ishld; ldr [tail];
> vs
> ldr [head]; dmb ishld; lda [tail];
>
> what would be the difference between these 3 cases?
Case A: lda {head]; ldr[tail]
load of the head will be observed by the memory subsystem
before the load of the tail.
Case B: ldr [head]; dmb ishld; ldr [tail];
load of the head will be observed by the memory subsystem
Before the load of the tail.
Case C: ldr [head]; dmb ishld; lda [tail];
load of the head will be observed by the memory subsystem
before the load of the tail. In addition, any load or store program
order after lda[tail] will not be observed by the memory subsystem
before the load of the tail.
Essentially both cases A and B are the same.
They preserve following program orders.
LOAD-LOAD
LOAD-STORE