Hi Olivier,

Please see my comments.
        do {
                prod_head = r->prod.head;
                cons_tail = r->cons.tail;
                prod_next = prod_head + n;
                success = rte_atomic32_cmpset(&r->prod.head, prod_head, 
prod_next);

                /*
                  * Why not enqueue data here? It would be just a couple of 
pointers assignment, not taking too much time. 
                  * Then the entire CAS loop contains both pointer adjustment 
and data enqueue, and the dequeue operation would not have a chance to 
interfere data producing.
                  * The next wait loop can be removed accordingly.
                /*              

        } while (unlikely(success == 0));

        /*
        while (unlikely(r->prod.tail != prod_head))
                rte_pause();

        r->prod.tail = prod_next;
        */


Regards,
Bob


-----Original Message-----
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Olivier MATZ
Sent: Tuesday, August 20, 2013 4:22 PM
To: Bob Chen
Cc: dev
Subject: Re: [dpdk-dev] A question of DPDK ring buffer

Hello Ben,

> OK, here is the question: Why DPDK has to maintain that public 
> prod_tail structure? Is it really necessary to endure a while loop here?

If you remove this wait loop, you can trigger an issue. Imagine a case where 
core 0 wants to add an object in the ring: it does the CAS, modifying 
prod_head. At this time it is interrupted for some reason (maybe by the kernel) 
before writing the object pointer in the ring, and thus before the modification 
of prod_tail.

During this time, core 1 wants to enqueue another object: it does the CAS, then 
writes the object pointer, then modifies prod_head (without waiting the core 0 
as we removed the wait loop).

Now the state ring is wrong: it shows 2 objects, but one object pointer is 
invalid. If you try to dequeue the objects, it will return an bad pointer.

Of course, the interruption by the kernel should be avoided as much as 
possible, but even without beeing interrupted, a similar scenario can occur if 
a core is slower than another to enqueue its data (due to a cache miss for 
instance, or because the first core enqueues more objects than the other).

To convince you, I think you can remove the wait loop and run the ring test in 
app/test/test_ring.c, I suppose it won't work.

Regards,
Olivier

Reply via email to