> This patch replaces use of the deprecated rte_atomic32 code with
> GCC builtin atomic operations.
> 
> Although it would be preferable to use C11 version on all architectures,
> there is a performance loss if we do it that way:
> 
> Measured on i9-13900H, two physical cores MP/MC bulk n=128, 10 runs:
>   with C11 builtin:           5.86 cycles/elem
>   with __sync builtin:        5.36 cycles/elem  (-9.4%)
> 
> The C11 __atomic_compare_exchange_n builtin writes the actual value back
> to its expected pointer on failure. On x86 this forces GCC
> to emit extra instructions on the critical path between the CAS
> and the success-test.
> 
> __sync_bool_compare_and_swap returns a plain bool with no pointer
> writeback, allowing GCC to emit tighter code.
> 
> Signed-off-by: Stephen Hemminger <[email protected]>
> ---
>  lib/ring/meson.build                          |  2 +-
>  lib/ring/rte_ring_c11_pvt.h                   |  3 +-
>  lib/ring/rte_ring_elem_pvt.h                  |  2 +-
>  ..._ring_generic_pvt.h => rte_ring_gcc_pvt.h} | 37 +++++++++++--------
>  4 files changed, 24 insertions(+), 20 deletions(-)
>  rename lib/ring/{rte_ring_generic_pvt.h => rte_ring_gcc_pvt.h} (87%)
> 
> diff --git a/lib/ring/meson.build b/lib/ring/meson.build
> index 21f2c12989..2ba160b178 100644
> --- a/lib/ring/meson.build
> +++ b/lib/ring/meson.build
> @@ -9,7 +9,7 @@ indirect_headers += files (
>          'rte_ring_elem.h',
>          'rte_ring_elem_pvt.h',
>          'rte_ring_c11_pvt.h',
> -        'rte_ring_generic_pvt.h',
> +        'rte_ring_gcc_pvt.h',
>          'rte_ring_hts.h',
>          'rte_ring_hts_elem_pvt.h',
>          'rte_ring_peek.h',
> diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> index 5afc14dec9..8358b0f21f 100644
> --- a/lib/ring/rte_ring_c11_pvt.h
> +++ b/lib/ring/rte_ring_c11_pvt.h
> @@ -43,7 +43,6 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht,
> uint32_t old_val,
>        */
>       rte_atomic_store_explicit(&ht->tail, new_val,
> rte_memory_order_release);
>  }
> -
>  /**
>   * @internal This is a helper function that moves the producer/consumer head
>   *    optimized for single threaded case
> @@ -82,7 +81,7 @@ __rte_ring_headtail_move_head_st(struct rte_ring_headtail
> *d,
>       /* Single producer: only this thread writes d->head,
>        * so a relaxed load is sufficient.
>        */
> -     *old_head = rte_atomic_load_explicit(&d->head,
> rte_memory_order_relaxed);
> +     *old_head = rte_atomic_load_explicit(&d->head,
>       rte_memory_order_acquire);

Not sure, why it had changed to 'acquire' here?
Looks like just patch splitting mistake, no?

> 
>       /* Acquire pairs with the consumer's release-store of tail in
> __rte_ring_update_tail,
>        * ensuring the consumer's ring-element reads are complete before

Reply via email to