On 29 May 2015 at 13:55, Zoltan Kiss <[email protected]> wrote:
>
>
> On 28/05/15 17:40, Ola Liljedahl wrote:
>
>> On 28 May 2015 at 17:23, Zoltan Kiss <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>
>>
>> On 28/05/15 16:00, Ola Liljedahl wrote:
>>
>> I disprove of this solution. TX completion processing (cleaning TX
>> descriptor rings after transmission complete) is an implementation
>> (hardware) aspect and should be hidden from the application.
>>
>>
>> Unfortunately you can't, if you want your pktio application work
>> with poll mode drivers. In that case TX completion interrupt (can
>> be) disabled and the application has to control that as well. In
>> case of DPDK you just call the send function (with 0 packets, if you
>> don't have anything to send at the time)
>>
>> Why do you have to retire transmitted packet if you are not transmitting
>> new packets (and need those descriptors in the TX ring)?
>>
> Because otherwise they are a memory leak.
They are not leaked! They are still in the TX ring, just waiting to get
retired.
> Those buffers might be needed somewhere else. If they are only released
> when you send/receive packets out next time, you are in trouble, because
> that might never happen. Especially when that event is blocked because your
> TX ring is full of unreleased packets.
Having to few buffers is always a problem. You don't want to have too large
RX/TX rings because that just increases buffering and latency ("buffer
bloat" problem).
>
> Does the
>
>> application have too few packets in the pool so that reception will
>> suffer?
>>
> Let me approach the problem from a different angle: the current workaround
> is that you have to allocate a pool with _loooads_ of buffers, so you have
> a good chance you never run out of free buffers. Probably. Because it still
> doesn't guarantee that there will be a next send/receive event on that
> interface to release the packets.
>
>
>
>
>>
>>
>> There isn't
>>
>> any corresponding call that refills the RX descriptor rings with
>> fresh
>> buffers.
>>
>> You can do that in the receive function, I think that's how the
>> drivers are doing it generally.
>>
>>
>> The completion processing can be performed from any ODP call, not
>> necessary odp_pktio_send().
>>
>>
>> I think "any" is not specific enough. Which one?
>>
>> odp_pktio_recv, odp_schedule. Wherever the application blocks or busy
>> waits waiting for more packets.
>>
> We do that already on odp_pktio_recv. It doesn't help, because you can
> only release the buffers held in the current interface's TX ring. You can't
> do anything about other interfaces.
>
Why not?
There is no guarantee that the application thread calling odp_pktio_recv()
on one interface is the only one transmitting on that specific egress
interface. In the general case, all threads may be using all pktio
interfaces for both reception and transmission.
I mean, you could trigger TX completion on every interface every time you
> receive on one, but that would be a scalability nightmare.
Maybe not every time. I expect a more creative solution than this. Perhaps
when you run out of buffers in the pool?
>
>
>
>>
>>
>> Can you provide a vague draft how would you fix the l2fwd example
>> below?
>>
>> I don't think anything needs fixing on the application level.
>>
>
> Wrong. odp_l2fwd uses one packet pool, receives from pktio_src and then if
> there is anything received, it sends it out on pktio_dst.
This specific application has this specific behavior. Are you sure this is
a general solution? I am not.
> Let's say the pool has 576 elements, and the interfaces uses 256 RX and
> 256 TX descriptors. You start with 2*256 buffers kept in the two RX ring.
> Let's say you receive the first 64 packets, you refill the RX ring
> immediately, so now you're out of buffers. You can send out that 64, but in
> the next iteration odp_pktio_recv() will return 0 because it can't refill
> the RX descriptors. (and the driver won't give you back any buffer unless
> you can refill it). And now you are in an infinite loop, recv will always
> return 0, because you never release the packets.
>
The size of the pool should somehow be correlated with the size of the RX
and TX rings for "best performance" (whatever this means). But I also think
that the system should function regardless of RX/TX ring sizes and pool
size, "function" meaning not deadlock.
There are several ways to fix this:
> - tell the application writer that if you see deadlocks, increase the
> element size of the buffer. I doubt anyone would ever use ODP to anything
> serious when seeing such thing.
> - you can't really give anything more specific than in the previous point,
> because such details as RX/TX descriptor numbers are abstracted away,
> intentionally. And your platform can't autotune them, because it doesn't
> know how many elements you have in the pool used for TX. In fact, it could
> be more than just one pool.
> - make sure that you run odp_pktio_send even if pkts == 0. In case of
> ODP-DPDK it can help because that actually triggers TX completion.
> Actually, we can make odp_pktio_send_complete() == odp_pktio_send(len=0),
> so we don't have to introduce a new function. But that doesn't change the
> fact that we have to call TX completion periodically to make sure nothing
> is blocked.
>
So why doesn't the ODP-for-DPDK implementation call TX completion
"periodically" or at some other suitable times?
> - or we can just do what I proposed in the patch, which is very similar to
> the previous point, but articulate the importance of TX completion more.
>
Which is a platform specific problem and exactly the kind of things that
the ODP API should hide and not expose.
>
>
>
>>
>> -- Ola
>>
>>
>> On 28 May 2015 at 16:38, Zoltan Kiss <[email protected]
>> <mailto:[email protected]>
>> <mailto:[email protected] <mailto:[email protected]>>>
>>
>> wrote:
>>
>> A pktio interface can be used with poll mode drivers, where
>> TX
>> completion often
>> has to be done manually. This turned up as a problem with
>> ODP-DPDK and
>> odp_l2fwd:
>>
>> while (!exit_threads) {
>> pkts = odp_pktio_recv(pktio_src,...);
>> if (pkts <= 0)
>> continue;
>> ...
>> if (pkts_ok > 0)
>> odp_pktio_send(pktio_dst, pkt_tbl, pkts_ok);
>> ...
>> }
>>
>> In this example we never call odp_pktio_send() on pktio_dst
>> if there
>> wasn't
>> any new packets received on pktio_src. DPDK needs manual TX
>> completion. The
>> above example should have an
>> odp_pktio_send_completion(pktio_dst)
>> right at the
>> beginning of the loop.
>>
>> Signed-off-by: Zoltan Kiss <[email protected]
>> <mailto:[email protected]>
>> <mailto:[email protected]
>>
>> <mailto:[email protected]>>>
>>
>> ---
>> include/odp/api/packet_io.h | 16 ++++++++++++++++
>> 1 file changed, 16 insertions(+)
>>
>> diff --git a/include/odp/api/packet_io.h
>> b/include/odp/api/packet_io.h
>> index b97b2b8..3a4054c 100644
>> --- a/include/odp/api/packet_io.h
>> +++ b/include/odp/api/packet_io.h
>> @@ -119,6 +119,22 @@ int odp_pktio_recv(odp_pktio_t pktio,
>> odp_packet_t pkt_table[], int len);
>> int odp_pktio_send(odp_pktio_t pktio, odp_packet_t
>> pkt_table[],
>> int len);
>>
>> /**
>> + * Release sent packets
>> + *
>> + * This function should be called after sending on a
>> pktio. If the
>> platform
>> + * doesn't implement send completion in other ways, this
>> function
>> should call
>> + * odp_packet_free() on packets where transmission is
>> already
>> completed. It can
>> + * be a no-op if the platform guarantees that the packets
>> will be
>> released upon
>> + * completion, but the application must call it
>> periodically after
>> send to make
>> + * sure packets are released.
>> + *
>> + * @param pktio ODP packet IO handle
>> + *
>> + * @retval <0 on failure
>> + */
>> +int odp_pktio_send_complete(odp_pktio_t pktio);
>> +
>> +/**
>> * Set the default input queue to be associated with a
>> pktio handle
>> *
>> * @param pktio ODP packet IO handle
>> --
>> 1.9.1
>>
>> _______________________________________________
>> lng-odp mailing list
>> [email protected] <mailto:[email protected]>
>> <mailto:[email protected] <mailto:[email protected]
>> >>
>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>
>>
>>
>>
_______________________________________________
lng-odp mailing list
[email protected]
https://lists.linaro.org/mailman/listinfo/lng-odp