> -----Original Message-----
> From: Walukiewicz, Miroslaw
> Sent: Thursday, January 22, 2015 5:53 PM
> To: Liang, Cunming; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL
> thread
> 
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> > Sent: Thursday, January 22, 2015 9:17 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL
> > thread
> >
> > For non-EAL thread, bypass per lcore cache, directly use ring pool.
> > It allows using rte_mempool in either EAL thread or any user pthread.
> > As in non-EAL thread, it directly rely on rte_ring and it's none preemptive.
> > It doesn't suggest to run multi-pthread/cpu which compete the
> > rte_mempool.
> > It will get bad performance and has critical risk if scheduling policy is 
> > RT.
> >
> > Signed-off-by: Cunming Liang <cunming.liang at intel.com>
> > ---
> >  lib/librte_mempool/rte_mempool.h | 18 +++++++++++-------
> >  1 file changed, 11 insertions(+), 7 deletions(-)
> >
> > diff --git a/lib/librte_mempool/rte_mempool.h
> > b/lib/librte_mempool/rte_mempool.h
> > index 3314651..4845f27 100644
> > --- a/lib/librte_mempool/rte_mempool.h
> > +++ b/lib/librte_mempool/rte_mempool.h
> > @@ -198,10 +198,12 @@ struct rte_mempool {
> >   *   Number to add to the object-oriented statistics.
> >   */
> >  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
> > -#define __MEMPOOL_STAT_ADD(mp, name, n) do {                       \
> > -           unsigned __lcore_id = rte_lcore_id();           \
> > -           mp->stats[__lcore_id].name##_objs += n;         \
> > -           mp->stats[__lcore_id].name##_bulk += 1;         \
> > +#define __MEMPOOL_STAT_ADD(mp, name, n) do {                    \
> > +           unsigned __lcore_id = rte_lcore_id();           \
> > +           if (__lcore_id < RTE_MAX_LCORE) {               \
> > +                   mp->stats[__lcore_id].name##_objs += n; \
> > +                   mp->stats[__lcore_id].name##_bulk += 1; \
> > +           }                                               \
> >     } while(0)
> >  #else
> >  #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0)
> > @@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp,
> > void * const *obj_table,
> >     __MEMPOOL_STAT_ADD(mp, put, n);
> >
> >  #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> > -   /* cache is not enabled or single producer */
> > -   if (unlikely(cache_size == 0 || is_mp == 0))
> > +   /* cache is not enabled or single producer or none EAL thread */
> 
> I don't understand this limitation.
> 
> I see that the rte_membuf.h defines table per RTE_MAX_LCORE like below
> #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
>         /** Per-lcore local cache. */
>         struct rte_mempool_cache local_cache[RTE_MAX_LCORE];
> #endif
> 
> But why we cannot extent the size of the local cache table to something like
> RTE_MAX_THREADS that does not exceed max value of rte_lcore_id()
> 
> Keeping this condition here is a  real performance killer!!!!!!.
> I saw in my test application spending more 95% of CPU time reading the atomic
> in M C/MP ring utilizing access to mempool.
[Liang, Cunming] This is the first step to make it work.
By Konstantin's comments, shall prevent to allocate unique id by ourselves.
And the return value from gettid() is too large as an index.
For non-EAL thread performance gap, will think about additional fix patch here.
If care about performance, still prefer to choose EAL thread now.
> 
> Same comment for get operation below
> 
> > +   if (unlikely(cache_size == 0 || is_mp == 0 ||
> > +                lcore_id >= RTE_MAX_LCORE))
> >             goto ring_enqueue;
> >
> >     /* Go straight to ring if put would overflow mem allocated for cache
> > */
> > @@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void
> > **obj_table,
> >     uint32_t cache_size = mp->cache_size;
> >
> >     /* cache is not enabled or single consumer */
> > -   if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size))
> > +   if (unlikely(cache_size == 0 || is_mc == 0 ||
> > +                n >= cache_size || lcore_id >= RTE_MAX_LCORE))
> >             goto ring_dequeue;
> >
> >     cache = &mp->local_cache[lcore_id];
> > --
> > 1.8.1.4

Reply via email to