[dpdk-dev] rte_ring's dequeue appears to be slow

Dor Green Mon, 6 Apr 2015 15:18:21 +0300

I have an app which captures packets on a single core and then passes
to multiple workers on different lcores, using the ring queues.


While I manage to capture packets at 10Gbps, when I send it to the
processing lcores there is substantial packet loss. At first I figured
it's the processing I do on the packets and optimized that, which did
help it a little but did not alleviate the problem.

I used Intel VTune amplifier to profile the program, and on all
profiling checks that I did there, the majority of the time in the
program is spent in "__rte_ring_sc_do_dequeue" (about 70%). I was
wondering if anyone can tell me how to optimize this, or if I'm using
the queues incorrectly, or maybe even doing the profiling wrong
(because I do find it weird that this dequeuing is so slow).

My program architecture is as follows (replaced consts with actual values):

A queue is created for each processing lcore:
      rte_ring_create(qname, swsize, NUMA_SOCKET, 1024*1024,
RING_F_SP_ENQ | RING_F_SC_DEQ);

The processing core enqueues packets one by one, to each of the queues
(the packet burst size is 256):
     rte_ring_sp_enqueue(lc[queue_index].queue, (void *const)pkts[i]);

Which are then dequeued in bulk in the processor lcores:
     rte_ring_sc_dequeue_bulk(lc->queue, (void**) &mbufs, 128);

I'm using 16 1GB hugepages, running the new 2.0 version. If there's
any further info required about the program, let me know.

Thank you.

[dpdk-dev] rte_ring's dequeue appears to be slow

Reply via email to