Re: [lng-odp] [API-NEXT PATCH] api-next: pktio: add odp_pktio_send_complete() definition

Zoltan Kiss Tue, 02 Jun 2015 08:49:28 -0700

Hi,

Many thanks Maciej for the insights! DPDK also does this TX completionin the send call, but as Ola explained, this problem arise when youcan't expect that call to happen. E.g. because there is nothing new tosend, but you can't receive as well because all the buffers are waitingto be freed. That was the issue with odp_l2fwd under ODP-DPDK. The bestif you can increase your pool size, but it could take painfully long tofigure out such a deadlock, and it could be a complicated question howbig number you should use. To handle that situation more gracefully,I've implemented a way to flush buffers from other devices, but only ifthe receive function returns 0 and there is no more free buffers in thepool.


Regards,

Zoli


On 02/06/15 10:34, Maciej Czekaj wrote:

Zoltan,

I am currently working on ThunderX port so I can offer an insight into
one of the implementations.

ThunderX has more server-like network adapter as opposed to Octeon or
QorIQ, so buffer management is done in software.
I think the problem with pool starvation affects mostly those kinds of
platforms and any mismanagement here may have dire costs.

On Thunder the buffers are handled the same way as in DPDK, so
transmitted buffers have to be "reclaimed" after hardware finished
processing them.
The best and most efficient  way to free the buffers is to do it while
transmitting others on the same queue, i.e. in odp_pktio_send or in
enqueue operation. There are several reasons behind this:

  1. TX ring is accessed anyway so it minimizes cache misses.

  2.  TX ring H/W registers are accessed while transmitting packets, so
information about the ring occupation is already extracted by software.
      This leads to minimizing the overhead of H/W register access,
which may be quite significant even on internal PCI-E bus.

  3. Any other scheme, e.g. doing it in mempool or in RX as suggested
previously, leads to extra overhead from points 1 and 2 and another
overhead caused by synchronization of access to the ring:

      - accessing TX ring from mempool must be thread-safe, mempool may
be invoked from another context than ring transmission
      - accessing the transmission ring from receive operation leads to
similar thread safety issue where RX and TX, being independent
operations from H/W perspective, must be additionally synchronized with
respect to each other

Summarizing, any high-performance implementation must live with the fact
that some buffers will be kept in TX ring for a while and choose the
mempool size accordingly. This is true at least for Thunder and any
other similar "server" adapters.  On the other hand, the issue may be
non-existent in specialized network processors, but then there is no
need for extra API or extra software tricks, anyway.

There memory pressure may come not only from TX ring, but from RX ring
as well, when flooded with packets. That leads to the same challange,
but reversed, i.e. the receive function greedily allocates packets to
feed the H/W with as many free buffers as possible and there is
currently no way to limit that.

That is why from Thunder perspective a practical solution is:
  - explicitly stating the "depth" of the engine (both RX and TX) by
either API or some parameter and letting the implementer choose how to
deal with the problem
  - adding the note that transmission functions are responsible for
buffer cleanup, to let the application choose the best strategy

This is by all means not a sliver bullet but it gives the user the tools
to deal with the problem and at the same time does not impose
unnecessary overhead for certain implementations.


Cheers
Maciej

2015-05-29 18:03 GMT+02:00 Zoltan Kiss <[email protected]
<mailto:[email protected]>>:

    Hi,

    On 29/05/15 16:58, Jerin Jacob wrote:

        I agree. Is it possbile to dedicate "core 0"/"any core" in
        ODP-DPDK implementation
        to do the house keeping job ? If we are planning for ODP-DPDK
        implementation as just wrapper over DPDK API then there will not
        be any
        value addition to use the ODP API. At least from my experience,
        We have
        changed our  SDK "a lot" to fit into ODP model. IMO that kind of
        effort will
        be required for useful ODP-DPDK port.


    It would be good to have some input from other implementations as
    well: when do you release the sent packets in the Cavium implementation?

    _______________________________________________
    lng-odp mailing list
    [email protected] <mailto:[email protected]>
    https://lists.linaro.org/mailman/listinfo/lng-odp

_______________________________________________
lng-odp mailing list
[email protected]
https://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] [API-NEXT PATCH] api-next: pktio: add odp_pktio_send_complete() definition

Reply via email to