On 2/22/24 21:28, Konstantin Ananyev wrote:

+CC: Ethernet API maintainers
+CC: Jerin (commented on another branch of this thread)

From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com]
Sent: Sunday, 11 February 2024 16.04

TSO breaks when MSS spans more than 8 data fragments. Those
packets will be dropped by Tx preparation API, but it will
cause
MDD event if txonly forwarding engine does not call the Tx
preparation
API before transmitting packets.


txonly is used commonly, adding Tx prepare for a specific case
may
impact performance for users.

What happens when driver throws MDD (Malicious Driver Detection)
event,
can't it be ignored? As you are already OK to drop the packet,
can
device be configured to drop these packages?


Or as Jerin suggested adding a new forwarding engine is a
solution,
but
that will create code duplication, I prefer to not have it if
this
can
be handled in device level.

Actually I am agree with the author of the patch - when TX offloads
and/or multisegs are enabled,
user supposed to invoke eth_tx_prepare().
Not doing that seems like a bug to me.

I strongly disagree with that statement, Konstantin!
It is not documented anywhere that using TX offloads and/or multisegs
requires calling rte_eth_tx_prepare() before
rte_eth_tx_burst(). And none of the examples do it.

In fact, we do use it for test-pmd/csumonly.c.
About other sample apps:
AFAIK, not many of other DPDK apps do use L4 offloads.
Right now special treatment (pseudo-header cksum calculation) is needed
only for L4 offloads (CKSUM, SEG).
So, majority of our apps who rely on other TX offloads (multi-seg, ipv4
cksum, vlan insertion) happily run without
calling tx_prepare(), even though it is not the safest way.


In my opinion:
If some driver has limitations for a feature, e.g. max 8 fragments,
it should be documented for that driver, so the application
developer can make the appropriate decisions when designing the
application.
Furthermore, we have APIs for the drivers to expose to the
applications what the driver supports, so the application can configure
itself optimally at startup. Perhaps those APIs need to be expanded.
And if a feature limitation is common across the majority of drivers,
that limitation should be mentioned in the documentation of the
feature itself.

Many of such limitations *are* documented and in fact we do have an API
to check max segments that each driver support,
see struct rte_eth_desc_lim.

Yes, this is the kind of API we should provide, so the application can 
configure itself appropriately.

The problem is:
- none of our sample app does proper check on these values, so users
don't have a good example how to do it.

Agreed.
Adding an example showing how to do it properly would be the best solution.
Calling tx_prepare() in the examples is certainly not the solution.

- with current DPDK API not all of HW/PMD requirements could be
extracted programmatically:
   let say majority of Intel PMDs for TCP offloads expect pseudo-header
cksum to be pre-calculated by the SW.

I hope this requirement is documented somewhere.

   another example, some HW expects pkt_len to be bigger then some
threshold value, otherwise HW hang may appear.

I hope this requirement is also documented somewhere.

No idea, I found it only in the code.

IMHO Tx burst must check such limitations. If you made your HW simpler
(or just lost it on initial testing), pay in your drivers (or your
HW+driver will be unusable because of such problems).

Generally, if the requirements cannot be extracted programmatically, they must 
be prominently documented, like this note to
rte_eth_rx_burst():

Obviously, more detailed documentation is always good, but...
Right now we have 50+ different PMDs from different vendors.
Even if each and every of them will carefully document all possible limitations 
and necessary preparation steps,
how DPDK app developer supposed to  deal with all that?
Do you expect everyone, to read carefully through all of them, and handle all 
of them properly oh his own
in each and every DPDK app he is going to write?
That seems unrealistic.
Again what to do with backward compatibility: when new driver (with new 
limitations) will arise
*after* your app is already written and tested?

+1


  * @note
  *   Some drivers using vector instructions require that *nb_pkts* is
  *   divisible by 4 or 8, depending on the driver implementation.

I'm wondering what application should do if it needs to send just one packet and do it now. IMHO, such limitations are not acceptable.


- As new HW and PMD keep appearing it is hard to predict what extra
limitations/requirements will arise,
   that's why tx_prepare() was introduced as s driver op.


We don't want to check in the fast path what can be checked at
startup or build time!

If your app supposed to work with just a few, known in advance, NIC
models, then sure, you can do that.
For apps that supposed to work 'in general'  with any possible PMDs
that DPDK supports - that might be a problem.
That's why tx_prepare() was introduced and it is strongly recommended
to use it by the apps that do use TX offloads.
Probably tx_prepare() is not the best possible approach, but right now
there are not many alternatives within DPDK.

What exactly is an application supposed to do if tx_prepare() doesn't accept 
the full burst? It doesn't return information about what is
wrong.

It provides some information, but it is *very* limited: just index of 'bad' 
packet and error code.
In theory app can try to handle it in a 'smart' way: let say if ENOTSUP is 
returned, then try to disable all HW offloads
and do all in SW. But again, it is much better to do so *before* submitting 
packets for TX, so in practice everyone
just drop such 'bad' packets.

  Dropping the packets might not be an option, e.g. for applications used in 
life support or tele-medicine.

Critical applications should be able to do all Tx offloads in SW and retry. Of course, various statistics and logs should help to improve the
application.

If the packet is 'bad', then it is much better to drop it, then TX corrupted 
packet or even hang NIC HW completely.

IMHO Tx burst must drop packet which could hang NIC HW completely. I realize that it is an extra checks and performance drop, but vendor
should pay in performance if HW is not good enough.

Though off-course it is much better to have an app that would check for 
limitations that can be checked by API provided
and enable only supported offloads.

Yes, that's why API to get limitation is much better than documentation.

If limitations are documented, an application can use the lowest common 
denominator of the NICs it supports. And if the application is
supposed to work in general, that becomes the lowest common denominator of all 
NICs.

I agree: for limitations that can be extracted with generic API, like:
number of segments per packet, supported TX offloads, mbuf fileds that must be 
provided for each TX offload,  etc. -
it is responsibility of well-written application to obey all of them.
Yes, many tx_prepare() implementations do  such checks anyway, but from my 
perspective it is sort of last-line of defense.
For well written application that should just never happen.
But there is one more important responsibility of tx_prepare() -  it performs 
PMD specific packet modifications for requested offloads.
As I already mentioned for Intel NICs - it does pseudo-header cksum 
calcucation, for packets with size less then minimal, it can
probably do padding (even if doesn't do it right now), for some other PMDs - 
might be something else, I didn't check.
Obviously it saves app developer from a burden to do all these things on his 
own.

It looks like tx_prepare() has become a horrible workaround for undocumented 
limitations.

I strongly disagree. Documentation is never a solution for a generic application which is intended to work on any HW and IMHO it is the
goal to have more and more applications which work on any HW.

Limitations due to hardware and/or software tradeoffs are unavoidable, so we 
have to live with them; but we should not accept
PMDs with undocumented limitations.

As I already said, more documentation never hurts, but for that case, I think 
it is not enough.
I expect PMD to provide a tx_prepare() implementation that would deal with such 
specific things.

+1
Anyway, back to the original patch - I looked at it once again, and realized 
that the problem
is just in the unsupported number of segments.
As we discussed above - such limitations should be handled by well written app,
but none of DPDK apps does it right now.
So probably it is good opportunity to do things in a proper way and introduce 
such checks
in testpmd ;)

+1 since we already have fields in device information to report such limitations, but it does not say that Tx prepare should be dropped.
Drivers which don't need Tx prepare keep it NULL and it returns
immediately from ethdev. Since it is done per-burst, it should not
affect performance a lot.

Kaiwen, WDYT?



If it still works for some cases, that's a lucky coincidence, but not
the expected behavior.
About performance - first we can check is that really a drop.
Also as I remember most drivers set it to non-NULL value, only when
some TX offloads were
enabled by the user on that port, so hopefully for simple case (one
segment, no tx offloads) it
should be negligible.
Again, we can add manual check in testpmd tx-only code to decide do we
need a TX prepare
to be called or not.
Konstantin

Reply via email to