On Fri, Oct 17, 2025 at 10:41 AM Aditya Garg <[email protected]> wrote: > > On 08-10-2025 20:58, Aditya Garg wrote: > > On 08-10-2025 20:51, Eric Dumazet wrote: > >> On Wed, Oct 8, 2025 at 8:16 AM Aditya Garg > >> <[email protected]> wrote: > >>> > >>> On 03-10-2025 21:45, Eric Dumazet wrote: > >>>> On Fri, Oct 3, 2025 at 8:47 AM Aditya Garg > >>>> <[email protected]> wrote: > >>>>> > >>>>> The MANA hardware supports a maximum of 30 scatter-gather entries > >>>>> (SGEs) > >>>>> per TX WQE. In rare configurations where MAX_SKB_FRAGS + 2 exceeds > >>>>> this > >>>>> limit, the driver drops the skb. Add a check in mana_start_xmit() to > >>>>> detect such cases and linearize the SKB before transmission. > >>>>> > >>>>> Return NETDEV_TX_BUSY only for -ENOSPC from > >>>>> mana_gd_post_work_request(), > >>>>> send other errors to free_sgl_ptr to free resources and record the tx > >>>>> drop. > >>>>> > >>>>> Signed-off-by: Aditya Garg <[email protected]> > >>>>> Reviewed-by: Dipayaan Roy <[email protected]> > >>>>> --- > >>>>> drivers/net/ethernet/microsoft/mana/mana_en.c | 26 +++++++++++++ > >>>>> ++---- > >>>>> include/net/mana/gdma.h | 8 +++++- > >>>>> include/net/mana/mana.h | 1 + > >>>>> 3 files changed, 29 insertions(+), 6 deletions(-) > >>>>> > >>>>> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/ > >>>>> drivers/net/ethernet/microsoft/mana/mana_en.c > >>>>> index f4fc86f20213..22605753ca84 100644 > >>>>> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c > >>>>> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c > >>>>> @@ -20,6 +20,7 @@ > >>>>> > >>>>> #include <net/mana/mana.h> > >>>>> #include <net/mana/mana_auxiliary.h> > >>>>> +#include <linux/skbuff.h> > >>>>> > >>>>> static DEFINE_IDA(mana_adev_ida); > >>>>> > >>>>> @@ -289,6 +290,19 @@ netdev_tx_t mana_start_xmit(struct sk_buff > >>>>> *skb, struct net_device *ndev) > >>>>> cq = &apc->tx_qp[txq_idx].tx_cq; > >>>>> tx_stats = &txq->stats; > >>>>> > >>>>> + BUILD_BUG_ON(MAX_TX_WQE_SGL_ENTRIES != > >>>>> MANA_MAX_TX_WQE_SGL_ENTRIES); > >>>>> + #if (MAX_SKB_FRAGS + 2 > MANA_MAX_TX_WQE_SGL_ENTRIES) > >>>>> + if (skb_shinfo(skb)->nr_frags + 2 > > >>>>> MANA_MAX_TX_WQE_SGL_ENTRIES) { > >>>>> + netdev_info_once(ndev, > >>>>> + "nr_frags %d exceeds max > >>>>> supported sge limit. Attempting skb_linearize\n", > >>>>> + skb_shinfo(skb)->nr_frags); > >>>>> + if (skb_linearize(skb)) { > >>>> > >>>> This will fail in many cases. > >>>> > >>>> This sort of check is better done in ndo_features_check() > >>>> > >>>> Most probably this would occur for GSO packets, so can ask a software > >>>> segmentation > >>>> to avoid this big and risky kmalloc() by all means. > >>>> > >>>> Look at idpf_features_check() which has something similar. > >>> > >>> Hi Eric, > >>> Thank you for your review. I understand your concerns regarding the use > >>> of skb_linearize() in the xmit path, as it can fail under memory > >>> pressure and introduces additional overhead in the transmit path. Based > >>> on your input, I will work on a v2 that will move the SGE limit check to > >>> the ndo_features_check() path and for GSO skbs exceding the hw limit > >>> will disable the NETIF_F_GSO_MASK to enforce software segmentation in > >>> kernel before the call to xmit. > >>> Also for non GSO skb exceeding the SGE hw limit should we go for using > >>> skb_linearize only then or would you suggest some other approach here? > >> > >> I think that for non GSO, the linearization attempt is fine. > >> > >> Note that this is extremely unlikely for non malicious users, > >> and MTU being usually small (9K or less), > >> the allocation will be much smaller than a GSO packet. > > > > Okay. Will send a v2 > Hi Eric, > I tested the code by disabling GSO in ndo_features_check when the number > of SGEs exceeds the hardware limit, using iperf for a single TCP > connection with zerocopy enabled. I noticed a significant difference in > throughput compared to when we linearize the skbs. > For reference, the throughput is 35.6 Gbits/sec when using > skb_linearize, but drops to 6.75 Gbits/sec when disabling GSO per skb.
You must be doing something very wrong. Difference between TSO and non TSO should not be that high. ethtool -K eth0 tso on netperf -H tjbp27 MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to tjbp27.prod.google.com () port 0 AF_INET6 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 540000 262144 262144 10.00 92766.69 ethtool -K eth0 tso off netperf -H tjbp27 MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to tjbp27.prod.google.com () port 0 AF_INET6 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 540000 262144 262144 10.00 52218.97 Now if I force linearization, you can definitely see the very high cost of the copies ! ethtool -K eth1 sg off tjbp26:/home/edumazet# ./netperf -H tjbp27 MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to tjbp27.prod.google.com () port 0 AF_INET6 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 540000 262144 262144 10.00 16951.32 > > Hence, We propose to linearizing skbs until the first failure occurs. Hmm... basically hiding a bug then ? > After that, we switch to a fail-safe mode by disabling GSO for SKBs with > sge > hw limit using the ndo_feature_check implementation, while > continuing to apply skb_linearize() for non-GSO packets that exceed the > hardware limit. This ensures we remain on the optimal performance path > initially, and only transition to the fail-safe path after encountering > a failure. Please post your patch (adding the check in ndo_features_check()), perhaps one of us is able to help.
