Hi Stephen,

> The rte_atomic32_cmpset is deprecated. Initial attempts at
> changing this with direct conversion to
> rte_atomic_compare_exchange_weak_explicit()
> regressed MP/MC contended performance on x86 by 10-30%,
> because the C11 builtin's failure-writeback semantic forces
> GCC to emit extra instructions on the CAS critical path.
> 
> Add an internal __rte_ring_compare_and_swap() wrapper that calls
> __sync_bool_compare_and_swap() directly, which keeps the original
> instruction sequence. Add equivalent function for MSVC.

In fact, in rte_ring we do have 2 implementations of the same core functions:
lib/ring/rte_ring_c11_pvt.h  - uses C11 atomics
lib/ring/rte_ring_generic_pvt.h - uses legacy instructions (smp_mb, extra), 
If we going remove these legacy instructions anyway (or reimplementing them 
using C11 atomics),
then there is probably no point to keep rte_ring_generic_pvt.h.
Konstantin

> 
> Signed-off-by: Stephen Hemminger <[email protected]>
> ---
>  lib/ring/rte_ring_generic_pvt.h | 32 ++++++++++++++++++++++++++++----
>  1 file changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h
> index affd2d5ba7..0fb972de9e 100644
> --- a/lib/ring/rte_ring_generic_pvt.h
> +++ b/lib/ring/rte_ring_generic_pvt.h
> @@ -18,6 +18,30 @@
>   * For more information please refer to <rte_ring.h>.
>   */
> 
> +/**
> + * @internal optimized version of compare exchange
> + *
> + * The C11 builtin's failure-writeback semantic generates worse code on x86.
> + * Unlike rte_atomic_compare_exchange_*_explicit(), this wrapper does NOT
> + * write the actual value back to a pointer on failure. Callers in a retry
> + * loop must reload the expected value explicitly on the next iteration.
> + *
> + * Full memory barrier, equivalent to rte_memory_order_seq_cst on both
> + * success and failure.
> + */
> +static __rte_always_inline bool
> +__rte_ring_compare_and_swap(volatile uint32_t *dst,
> +                         uint32_t expected, uint32_t desired)
> +{
> +#if defined(RTE_TOOLCHAIN_MSVC)
> +     return _InterlockedCompareExchange((volatile long *)dst,
> +                                        (long)desired, (long)expected)
> +             == (long)expected;
> +#else
> +     return __sync_bool_compare_and_swap(dst, expected, desired);
> +#endif
> +}
> +
>  /**
>   * @internal This function updates tail values.
>   */
> @@ -108,10 +132,10 @@ __rte_ring_headtail_move_head(struct
> rte_ring_headtail *d,
>               if (is_st) {
>                       d->head = *new_head;
>                       success = 1;
> -             } else
> -                     success = rte_atomic32_cmpset(
> -                                     (uint32_t *)(uintptr_t)&d->head,
> -                                     *old_head, *new_head);
> +             } else {
> +                     success = __rte_ring_compare_and_swap(
> +                                     &d->head, *old_head, *new_head);
> +             }
>       } while (unlikely(success == 0));
>       return n;
>  }
> --
> 2.53.0

Reply via email to