On Wed, Nov 30, 2016 at 10:54:50AM +0000, Ananyev, Konstantin wrote:
> > Something is definitely needed here, and only PMDs can provide it. I think
> > applications should not have to clear checksum fields or initialize them to
> > some magic value, same goes for any other offload or hardware limitation
> > that needs to be worked around.
> > tx_prep() is one possible answer to this issue, however as mentioned in the
> > original patch it can be very expensive if exposed by the PMD.
> > Another issue I'm more concerned about is the way limitations are managed
> > (struct rte_eth_desc_lim). While not officially tied to tx_prep(), this
> > structure contains new fields that are only relevant to a few devices, and I
> > fear it will keep growing with each new hardware quirk to manage, breaking
> > ABIs in the process.
> Well, if some new HW capability/limitation would arise and we'd like to
> it in DPDK, then yes we probably would need to think how to incorporate it
> Do you have anything particular in mind here?
Nothing in particular, so for the sake of the argument, let's suppose that I
would like to add a field to expose some limitation that only applies to my
PMD during TX but looks generic enough to make sense, e.g. maximum packet
size when VLAN tagging is requested. PMDs are free to set that field to some
special value (say, 0) if they do not care.
Since that field exists however, conscious applications should check its
value for each packet that needs to be transmitted. This extra code causes a
slowdown just by sitting in the data path. Since it is not the only field in
that structure, the performance impact can be significant.
Even though this code is inside applications, it remains unfair to PMDs for
which these tests are irrelevant. This problem is identified and addressed
Thanks to tx_prepare(), these checks are moved back into PMDs where they
belong. PMDs that do not need them do not have to provide support for
tx_prepare() and do not suffer any performance impact as result;
applications only have to make sure tx_prepare() is always called at some
point before tx_burst().
Once you reach this stage, you've effectively made tx_prepare() mandatory
before tx_burst(). If some bug occurs, then perhaps you forgot to call
tx_prepare(), you just need to add it. The total cost for doing TX is
therefore tx_prepare() + tx_burst().
I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will
remain optional for long. Sure, PMDs that do not implement it do not care,
I'm focusing on applications, for which the performance impact of calling
tx_prepare() followed by tx_burst() is higher than a single tx_burst()
performing all the necessary preparation at once.
> > Following the same logic, why can't such a thing be made part of the TX
> > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > whenever necessary). From an application standpoint, what are the advantages
> > of having to:
> > if (tx_prep()) // iterate and update mbufs as needed
> > tx_burst(); // iterate and send
> > Compared to:
> > tx_burst(); // iterate, update as needed and send
> I think that was discussed extensively quite a lot previously here:
> As Thomas already replied - main motivation is to allow user
> to execute them on different stages of packet TX pipeline,
> and probably on different cores.
> I think that provides better flexibility to the user to when/where
> do these preparations and hopefully would lead to better performance.
And I agree, I think this use case is valid but does not warrant such a high
penalty when your application does not need that much flexibility. Simple
(yet conscious) applications need the highest performance. Complex ones as
you described already suffer quite a bit from IPCs and won't mind a couple
of extra CPU cycles right?
Yes they will, therefore we need a method that satisfies both cases.
As a possible solution, a special mbuf flag could be added to each mbuf
having gone through tx_prepare(). That way, tx_burst() could skip some
checks and things it would otherwise have done.
Another possibility, telling the PMD first that you always intend to use
tx_prepare() and getting a simpler/faster tx_burst() callback as a result.
> Though, if you or any other PMD developer/maintainer would prefer
> for particular PMD to combine both functionalities into tx_burst() and
> keep tx_prep() as NOP - this is still possible too.
Whether they implement it or not, this issue does not impact PMDs anyway, we
should probably ask DPDK application developers instead.
> > For both mlx4 and mlx5 then,
> > "it is OK, we do not need any checksum preparation for TSO".
> > Actually I do not think we'll ever need tx_prep() unless we add our own
> > quirks to struct rte_eth_desc_lim (and friends) which are currently quietly
> > handled by TX burst functions.
> Ok, so MLX PMD is not affected by these changes and tx_prep for MLX can be
> set to NULL, correct?
Correct, actually the rest of this message should be in a separate
thread. From the MLX side, there is no issue with tx_prepare().