On Sun, 30 Oct 2022 15:04:18 +0100 Morten Brørup <m...@smartsharesystems.com> wrote:
> > From: Morten Brørup [mailto:m...@smartsharesystems.com] > > Sent: Sunday, 30 October 2022 12.55 > > > > Split statistics from debug, to make mempool statistics available > > without > > the performance cost of continuously validating the cookies in the > > mempool > > elements. > > mempool_perf_autotest shows that the rate_persec drops to a third (-66 %) > when enabling full mempool debug - quite prohibitive! > > With this patch, the performance cost is much lower. I don't have detailed > test results, but the initial tests without mempool cache show a performance > drop of -11 %. Significant, but not prohibitive. One trick to avoid conditional in fast path would be to add a dummy stats[] per core. Another would be to move the fast path get/put stats into the mempool_cache. The current model has stats[] per cpu always on another cache line. diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h index 1f5707f46a21..87905b7286a6 100644 --- a/lib/mempool/rte_mempool.h +++ b/lib/mempool/rte_mempool.h @@ -236,8 +236,10 @@ struct rte_mempool { struct rte_mempool_memhdr_list mem_list; /**< List of memory chunks */ #ifdef RTE_LIBRTE_MEMPOOL_DEBUG - /** Per-lcore statistics. */ - struct rte_mempool_debug_stats stats[RTE_MAX_LCORE]; + /** Per-lcore statistics. + * Allocate one additional per-cpu slot for non-DPDK threads + */ + struct rte_mempool_debug_stats stats[RTE_MAX_LCORE + 1]; #endif } __rte_cache_aligned; @@ -302,10 +304,7 @@ struct rte_mempool { */ #ifdef RTE_LIBRTE_MEMPOOL_DEBUG #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do { \ - unsigned __lcore_id = rte_lcore_id(); \ - if (__lcore_id < RTE_MAX_LCORE) { \ - mp->stats[__lcore_id].name += n; \ - } \ + (mp)->stats[rte_lcore_id()].name += n; } while (0) #else #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do {} while (0)