Hi,
Many thanks Maciej for the insights! DPDK also does this TX completion
in the send call, but as Ola explained, this problem arise when you
can't expect that call to happen. E.g. because there is nothing new to
send, but you can't receive as well because all the buffers are waiting
to be freed. That was the issue with odp_l2fwd under ODP-DPDK. The best
if you can increase your pool size, but it could take painfully long to
figure out such a deadlock, and it could be a complicated question how
big number you should use. To handle that situation more gracefully,
I've implemented a way to flush buffers from other devices, but only if
the receive function returns 0 and there is no more free buffers in the
pool.
Regards,
Zoli
On 02/06/15 10:34, Maciej Czekaj wrote:
Zoltan,
I am currently working on ThunderX port so I can offer an insight into
one of the implementations.
ThunderX has more server-like network adapter as opposed to Octeon or
QorIQ, so buffer management is done in software.
I think the problem with pool starvation affects mostly those kinds of
platforms and any mismanagement here may have dire costs.
On Thunder the buffers are handled the same way as in DPDK, so
transmitted buffers have to be "reclaimed" after hardware finished
processing them.
The best and most efficient way to free the buffers is to do it while
transmitting others on the same queue, i.e. in odp_pktio_send or in
enqueue operation. There are several reasons behind this:
1. TX ring is accessed anyway so it minimizes cache misses.
2. TX ring H/W registers are accessed while transmitting packets, so
information about the ring occupation is already extracted by software.
This leads to minimizing the overhead of H/W register access,
which may be quite significant even on internal PCI-E bus.
3. Any other scheme, e.g. doing it in mempool or in RX as suggested
previously, leads to extra overhead from points 1 and 2 and another
overhead caused by synchronization of access to the ring:
- accessing TX ring from mempool must be thread-safe, mempool may
be invoked from another context than ring transmission
- accessing the transmission ring from receive operation leads to
similar thread safety issue where RX and TX, being independent
operations from H/W perspective, must be additionally synchronized with
respect to each other
Summarizing, any high-performance implementation must live with the fact
that some buffers will be kept in TX ring for a while and choose the
mempool size accordingly. This is true at least for Thunder and any
other similar "server" adapters. On the other hand, the issue may be
non-existent in specialized network processors, but then there is no
need for extra API or extra software tricks, anyway.
There memory pressure may come not only from TX ring, but from RX ring
as well, when flooded with packets. That leads to the same challange,
but reversed, i.e. the receive function greedily allocates packets to
feed the H/W with as many free buffers as possible and there is
currently no way to limit that.
That is why from Thunder perspective a practical solution is:
- explicitly stating the "depth" of the engine (both RX and TX) by
either API or some parameter and letting the implementer choose how to
deal with the problem
- adding the note that transmission functions are responsible for
buffer cleanup, to let the application choose the best strategy
This is by all means not a sliver bullet but it gives the user the tools
to deal with the problem and at the same time does not impose
unnecessary overhead for certain implementations.
Cheers
Maciej
2015-05-29 18:03 GMT+02:00 Zoltan Kiss <[email protected]
<mailto:[email protected]>>:
Hi,
On 29/05/15 16:58, Jerin Jacob wrote:
I agree. Is it possbile to dedicate "core 0"/"any core" in
ODP-DPDK implementation
to do the house keeping job ? If we are planning for ODP-DPDK
implementation as just wrapper over DPDK API then there will not
be any
value addition to use the ODP API. At least from my experience,
We have
changed our SDK "a lot" to fit into ODP model. IMO that kind of
effort will
be required for useful ODP-DPDK port.
It would be good to have some input from other implementations as
well: when do you release the sent packets in the Cavium implementation?
_______________________________________________
lng-odp mailing list
[email protected] <mailto:[email protected]>
https://lists.linaro.org/mailman/listinfo/lng-odp
_______________________________________________
lng-odp mailing list
[email protected]
https://lists.linaro.org/mailman/listinfo/lng-odp