> > > From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] > > Sent: Thursday, April 30, 2020 3:12 AM > > > > <snip> > > > > > > > > Hi Morten, > > > > > > On Tue, Apr 28, 2020 at 03:53:15PM +0200, Morten Brørup wrote: > > > > Olivier (maintainer of the Ring), > > > > > > I'm not anymore, CC'ing Konstantin and Honnappa. > > > > > > > I would like to suggest a couple of minor optimizations to the ring > > library. > > > > > > > > > > > > 1. Testing if the ring is empty is as simple as comparing the > > producer and > > > consumer pointers: > > > > > > > > static inline int > > > > rte_ring_empty(const struct rte_ring *r) { > > > > - return rte_ring_count(r) == 0; > > > > + uint32_t prod_tail = r->prod.tail; > > > > + uint32_t cons_tail = r->cons.tail; > > > > + return cons_tail == prod_tail; > > > > } > > > > > > > > In theory, this optimization reduces the number of potential cache > > misses > > > from 3 to 2 by not having to read r->mask in rte_ring_count(). > > > > > > This one looks correct to me. > > > > > > > > > > 2. It is not possible to enqueue more elements than the capacity of > > a ring, > > > so the count function does not need to test if the capacity is > > exceeded: > > > > > > > > static inline unsigned > > > > rte_ring_count(const struct rte_ring *r) { > > > > uint32_t prod_tail = r->prod.tail; > > > > uint32_t cons_tail = r->cons.tail; > > > > uint32_t count = (prod_tail - cons_tail) & r->mask; > > > > - return (count > r->capacity) ? r->capacity : count; > > > > + return count; > > > > } > > > > > > > > I cannot even come up with a race condition in this function where > > the > > > count would exceed the capacity. Maybe I missed something? > > > > > > Since there is no memory barrier, the order in which prod_tail and > > cons_tail > > > are fetched is not guaranteed. Or the thread could be interrupted by > > the > > > kernel in between. > > The '__rte_ring_move_prod_head' function ensures that the distance > > between prod.head and cons.tail is always within the capacity > > irrespective of whether the consumers/producers are sleeping. > > Yes, this is exactly what I was thinking. > > The tails are the pointers after any updates, which is shown very nicely in > the documentation. > And certainly prod.tail will not move further ahead than prod.head. > > So it makes sense using the tails outside the functions that move the > pointers. > > Olivier found the race I couldn't find: > 1. The thread calls rte_ring_count(), and since there is no memory barrier it > reads cons.tail, and has not yet read prod.tail. > 2. Other threads use the ring simultaneously. A consumer thread moves > cons.tail ahead and a producer thread then moves prod.tail ahead. > Note: Without first moving cons.tail ahead, prod.tail cannot move too far > ahead. > 3. The thread proceeds to read prod.tail. Now the count is completely > incorrect, and may even exceed the capacity. > > Olivier pointed out that this could happen if the rte_ring_count thread is > interrupted by the kernel, and I agree. However, intuitively I don't > think that it can happen in a non-EAL thread, because the consumer thread > needs to finish moving cons.tail before the producer thread can > move prod.tail too far ahead. And by then the rte_ring_count thread has had > plenty of time to read prod.tail. But it could happen in theory. > > > > This function may probably return invalid results in MC/MP cases. > > > We just ensure that the result is between 0 and r->capacity. > > So should we update the documentation to say that it might return an > incorrect count (if there is a race), or should we fix the function to > always provide a correct value?
As long as you invoke rte_ring_count() while there are other active producers/consumers for that ring - it's return value can always be outdated, and not reflect current ring state. So I think just updating the doc is enough. > > Furthermore, the same race condition probably affects rte_ring_empty() > similarly, even in my improved version. > > And do these functions need to support non-EAL threads? I don't think so. > What do you think? Not sure why you differ EAL and non EAL threads here. The only difference between them from scheduler point of view - EAL threads have cpu affinity set by rte_eal_init(). But nothing prevents user from setting/updating cpu affinity for any thread in his process in a way he likes.