Hi Olivier, > -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier MATZ > Sent: Friday, February 06, 2015 3:20 PM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 16/17] ring: add sched_yield to avoid spin > forever > > Hi, > > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > Add a sched_yield() syscall if the thread spins for too long, waiting other > > thread to finish its operations on the ring. > > That gives pre-empted thread a chance to proceed and finish with ring > > enqnue/dequeue operation. > > The purpose is to reduce contention on the ring. > > > > Signed-off-by: Cunming Liang <cunming.liang at intel.com> > > --- > > lib/librte_ring/rte_ring.h | 35 +++++++++++++++++++++++++++++------ > > 1 file changed, 29 insertions(+), 6 deletions(-) > > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > > index 39bacdd..c402c73 100644 > > --- a/lib/librte_ring/rte_ring.h > > +++ b/lib/librte_ring/rte_ring.h > > @@ -126,6 +126,7 @@ struct rte_ring_debug_stats { > > > > #define RTE_RING_NAMESIZE 32 /**< The maximum length of a ring name. */ > > #define RTE_RING_MZ_PREFIX "RG_" > > +#define RTE_RING_PAUSE_REP 0x100 /**< yield after num of times pause. */ > > > > /** > > * An RTE ring structure. > > @@ -410,7 +411,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * > > const *obj_table, > > uint32_t cons_tail, free_entries; > > const unsigned max = n; > > int success; > > - unsigned i; > > + unsigned i, rep; > > uint32_t mask = r->prod.mask; > > int ret; > > > > @@ -468,8 +469,19 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * > > const *obj_table, > > * If there are other enqueues in progress that preceded us, > > * we need to wait for them to complete > > */ > > - while (unlikely(r->prod.tail != prod_head)) > > - rte_pause(); > > + do { > > + /* avoid spin too long waiting for other thread finish */ > > + for (rep = RTE_RING_PAUSE_REP; > > + rep != 0 && r->prod.tail != prod_head; rep--) > > + rte_pause(); > > + > > + /* > > + * It gives pre-empted thread a chance to proceed and > > + * finish with ring enqnue operation. > > + */ > > + if (rep == 0) > > + sched_yield(); > > + } while (rep == 0); > > > > r->prod.tail = prod_next; > > return ret; > > @@ -589,7 +601,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void > > **obj_table, > > uint32_t cons_next, entries; > > const unsigned max = n; > > int success; > > - unsigned i; > > + unsigned i, rep; > > uint32_t mask = r->prod.mask; > > > > /* move cons.head atomically */ > > @@ -634,8 +646,19 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void > > **obj_table, > > * If there are other dequeues in progress that preceded us, > > * we need to wait for them to complete > > */ > > - while (unlikely(r->cons.tail != cons_head)) > > - rte_pause(); > > + do { > > + /* avoid spin too long waiting for other thread finish */ > > + for (rep = RTE_RING_PAUSE_REP; > > + rep != 0 && r->cons.tail != cons_head; rep--) > > + rte_pause(); > > + > > + /* > > + * It gives pre-empted thread a chance to proceed and > > + * finish with ring denqnue operation. > > + */ > > + if (rep == 0) > > + sched_yield(); > > + } while (rep == 0); > > > > __RING_STAT_ADD(r, deq_success, n); > > r->cons.tail = cons_next; > > > > The ring library was designed with the assumption that the code is not > preemptable. The code is lock-less but not wait-less. Actually, if the > code is preempted at a bad moment, it can spin forever until it's > unscheduled. > > I wonder if adding a sched_yield() may not penalize the current > implementations that only use one pthread per core? Even if there > is only one pthread in the scheduler queue for this CPU, calling > the scheduler code may cost thousands of cycles. > > Also, where does this value "RTE_RING_PAUSE_REP 0x100" comes from? > Why 0x100 is better than 42 or than 10000?
The idea was to have something few times bigger than actual number active cores in the system, to minimise chance of a sched_yield() being called for the case when we have one thread per physical core. My thought was that having that many repeats would make such chance neglectable. Though, right now, I don't have any data to back it up. > I think it could be good to check if there is a performance impact > with this change, especially where there is a lot of contention on > the ring. If it has an impact, what about adding a compile or runtime > option? Good idea, probably we should make RTE_RING_PAUSE_REP configuration option and let say avoid emitting ' sched_yield();' at all, if RTE_RING_PAUSE_REP == 0. Konstantin > > > Regards, > Olivier