> -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang > Sent: Thursday, January 22, 2015 9:17 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL > thread > > For non-EAL thread, bypass per lcore cache, directly use ring pool. > It allows using rte_mempool in either EAL thread or any user pthread. > As in non-EAL thread, it directly rely on rte_ring and it's none preemptive. > It doesn't suggest to run multi-pthread/cpu which compete the > rte_mempool. > It will get bad performance and has critical risk if scheduling policy is RT. > > Signed-off-by: Cunming Liang <cunming.liang at intel.com> > --- > lib/librte_mempool/rte_mempool.h | 18 +++++++++++------- > 1 file changed, 11 insertions(+), 7 deletions(-) > > diff --git a/lib/librte_mempool/rte_mempool.h > b/lib/librte_mempool/rte_mempool.h > index 3314651..4845f27 100644 > --- a/lib/librte_mempool/rte_mempool.h > +++ b/lib/librte_mempool/rte_mempool.h > @@ -198,10 +198,12 @@ struct rte_mempool { > * Number to add to the object-oriented statistics. > */ > #ifdef RTE_LIBRTE_MEMPOOL_DEBUG > -#define __MEMPOOL_STAT_ADD(mp, name, n) do { \ > - unsigned __lcore_id = rte_lcore_id(); \ > - mp->stats[__lcore_id].name##_objs += n; \ > - mp->stats[__lcore_id].name##_bulk += 1; \ > +#define __MEMPOOL_STAT_ADD(mp, name, n) do { \ > + unsigned __lcore_id = rte_lcore_id(); \ > + if (__lcore_id < RTE_MAX_LCORE) { \ > + mp->stats[__lcore_id].name##_objs += n; \ > + mp->stats[__lcore_id].name##_bulk += 1; \ > + } \ > } while(0) > #else > #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0) > @@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp, > void * const *obj_table, > __MEMPOOL_STAT_ADD(mp, put, n); > > #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 > - /* cache is not enabled or single producer */ > - if (unlikely(cache_size == 0 || is_mp == 0)) > + /* cache is not enabled or single producer or none EAL thread */
I don't understand this limitation. I see that the rte_membuf.h defines table per RTE_MAX_LCORE like below #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 /** Per-lcore local cache. */ struct rte_mempool_cache local_cache[RTE_MAX_LCORE]; #endif But why we cannot extent the size of the local cache table to something like RTE_MAX_THREADS that does not exceed max value of rte_lcore_id() Keeping this condition here is a real performance killer!!!!!!. I saw in my test application spending more 95% of CPU time reading the atomic in M C/MP ring utilizing access to mempool. Same comment for get operation below > + if (unlikely(cache_size == 0 || is_mp == 0 || > + lcore_id >= RTE_MAX_LCORE)) > goto ring_enqueue; > > /* Go straight to ring if put would overflow mem allocated for cache > */ > @@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void > **obj_table, > uint32_t cache_size = mp->cache_size; > > /* cache is not enabled or single consumer */ > - if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size)) > + if (unlikely(cache_size == 0 || is_mc == 0 || > + n >= cache_size || lcore_id >= RTE_MAX_LCORE)) > goto ring_dequeue; > > cache = &mp->local_cache[lcore_id]; > -- > 1.8.1.4