On Tue, Nov 25, 2025 at 10:05 PM Simon Schippers <[email protected]> wrote: > > On 11/25/25 02:34, Jason Wang wrote: > > On Mon, Nov 24, 2025 at 5:20 PM Simon Schippers > > <[email protected]> wrote: > >> > >> On 11/24/25 02:04, Jason Wang wrote: > >>> On Fri, Nov 21, 2025 at 5:23 PM Simon Schippers > >>> <[email protected]> wrote: > >>>> > >>>> On 11/21/25 07:19, Jason Wang wrote: > >>>>> On Thu, Nov 20, 2025 at 11:30 PM Simon Schippers > >>>>> <[email protected]> wrote: > >>>>>> > >>>>>> This patch series deals with tun/tap and vhost-net which drop incoming > >>>>>> SKBs whenever their internal ptr_ring buffer is full. Instead, with > >>>>>> this > >>>>>> patch series, the associated netdev queue is stopped before this > >>>>>> happens. > >>>>>> This allows the connected qdisc to function correctly as reported by > >>>>>> [1] > >>>>>> and improves application-layer performance, see our paper [2]. > >>>>>> Meanwhile > >>>>>> the theoretical performance differs only slightly: > >>>>>> > >>>>>> +--------------------------------+-----------+----------+ > >>>>>> | pktgen benchmarks to Debian VM | Stock | Patched | > >>>>>> | i5 6300HQ, 20M packets | | | > >>>>>> +-----------------+--------------+-----------+----------+ > >>>>>> | TAP | Transmitted | 195 Kpps | 183 Kpps | > >>>>>> | +--------------+-----------+----------+ > >>>>>> | | Lost | 1615 Kpps | 0 pps | > >>>>>> +-----------------+--------------+-----------+----------+ > >>>>>> | TAP+vhost_net | Transmitted | 589 Kpps | 588 Kpps | > >>>>>> | +--------------+-----------+----------+ > >>>>>> | | Lost | 1164 Kpps | 0 pps | > >>>>>> +-----------------+--------------+-----------+----------+ > >>>>> > >>>> > >>>> Hi Jason, > >>>> > >>>> thank you for your reply! > >>>> > >>>>> PPS drops somehow for TAP, any reason for that? > >>>> > >>>> I have no explicit explanation for that except general overheads coming > >>>> with this implementation. > >>> > >>> It would be better to fix that. > >>> > >>>> > >>>>> > >>>>> Btw, I had some questions: > >>>>> > >>>>> 1) most of the patches in this series would introduce non-trivial > >>>>> impact on the performance, we probably need to benchmark each or split > >>>>> the series. What's more we need to run TCP benchmark > >>>>> (throughput/latency) as well as pktgen see the real impact > >>>> > >>>> What could be done, IMO, is to activate tun_ring_consume() / > >>>> tap_ring_consume() before enabling tun_ring_produce(). Then we could see > >>>> if this alone drops performance. > >>>> > >>>> For TCP benchmarks, you mean userspace performance like iperf3 between a > >>>> host and a guest system? > >>> > >>> Yes, > >>> > >>>> > >>>>> > >>>>> 2) I see this: > >>>>> > >>>>> if (unlikely(tun_ring_produce(&tfile->tx_ring, queue, skb))) { > >>>>> drop_reason = SKB_DROP_REASON_FULL_RING; > >>>>> goto drop; > >>>>> } > >>>>> > >>>>> So there could still be packet drop? Or is this related to the XDP path? > >>>> > >>>> Yes, there can be packet drops after a ptr_ring resize or a ptr_ring > >>>> unconsume. Since those two happen so rarely, I figured we should just > >>>> drop in this case. > >>>> > >>>>> > >>>>> 3) The LLTX change would have performance implications, but the > >>>>> benmark doesn't cover the case where multiple transmission is done in > >>>>> parallel > >>>> > >>>> Do you mean multiple applications that produce traffic and potentially > >>>> run on different CPUs? > >>> > >>> Yes. > >>> > >>>> > >>>>> > >>>>> 4) After the LLTX change, it seems we've lost the synchronization with > >>>>> the XDP_TX and XDP_REDIRECT path? > >>>> > >>>> I must admit I did not take a look at XDP and cannot really judge if/how > >>>> lltx has an impact on XDP. But from my point of view, __netif_tx_lock() > >>>> instead of __netif_tx_acquire(), is executed before the tun_net_xmit() > >>>> call and I do not see the impact for XDP, which calls its own methods. > >>> > >>> Without LLTX tun_net_xmit is protected by tx lock but it is not the > >>> case of tun_xdp_xmit. This is because, unlike other devices, tun > >>> doesn't have a dedicated TX queue for XDP, so the queue is shared by > >>> both XDP and skb. So XDP xmit path needs to be protected with tx lock > >>> as well, and since we don't have queue discipline for XDP, it means we > >>> could still drop packets when XDP is enabled. I'm not sure this would > >>> defeat the whole idea or not. > >> > >> Good point. > >> > >>> > >>>>> > >>>>> 5) The series introduces various ptr_ring helpers with lots of > >>>>> ordering stuff which is complicated, I wonder if we first have a > >>>>> simple patch to implement the zero packet loss > >>>> > >>>> I personally don't see how a simpler patch is possible without using > >>>> discouraged practices like returning NETDEV_TX_BUSY in tun_net_xmit or > >>>> spin locking between producer and consumer. But I am open for > >>>> suggestions :) > >>> > >>> I see NETDEV_TX_BUSY is used by veth: > >>> > >>> static int veth_xdp_rx(struct veth_rq *rq, struct sk_buff *skb) > >>> { > >>> if (unlikely(ptr_ring_produce(&rq->xdp_ring, skb))) > >>> return NETDEV_TX_BUSY; /* signal qdisc layer */ > >>> > >>> return NET_RX_SUCCESS; /* same as NETDEV_TX_OK */ > >>> } > >>> > >>> Maybe it would be simpler to start from that (probably with a new > >>> tun->flags?). > >>> > >>> Thanks > >> > >> Do you mean that this patchset could be implemented using the same > >> approach that was used for veth in [1]? > >> This could then also fix the XDP path. > > > > I think so. > > Okay, I will do so and submit a v7 when net-next opens again for 6.19. > > > > >> > >> But is returning NETDEV_TX_BUSY fine in our case? > > > > If it helps to avoid packet drop. But I'm not sure if qdisc is a must > > in your case. > > I will try to avoid returning it. > > When no qdisc is connected, I will just drop like veth does. > > > > >> > >> Do you mean a flag that enables or disables the no-drop behavior? > > > > Yes, via a new flags that could be set via TUNSETIFF. > > > > Thanks > > I am not a fan of that, since I can not imagine a use case where > dropping packets is desired.
Right, it's just for the case when we can see regression for some specific test. > veth does not introduce a flag either. > > Of course, if there is a major performance degradation, it makes sense. > But I will benchmark it, and we will see. Exactly. Thanks > > Thank you! > > > > >> > >> Thanks! > >> > >> [1] Link: > >> https://lore.kernel.org/netdev/174559288731.827981.8748257839971869213.stgit@firesoul/T/#u > >> > >>> > >>>> > >>>>> > >>>>>> > >>>>>> This patch series includes tun/tap, and vhost-net because they share > >>>>>> logic. Adjusting only one of them would break the others. Therefore, > >>>>>> the > >>>>>> patch series is structured as follows: > >>>>>> 1+2: new ptr_ring helpers for 3 > >>>>>> 3: tun/tap: tun/tap: add synchronized ring produce/consume with queue > >>>>>> management > >>>>>> 4+5+6: tun/tap: ptr_ring wrappers and other helpers to be called by > >>>>>> vhost-net > >>>>>> 7: tun/tap & vhost-net: only now use the previous implemented > >>>>>> functions to > >>>>>> not break git bisect > >>>>>> 8: tun/tap: drop get ring exports (not used anymore) > >>>>>> > >>>>>> Possible future work: > >>>>>> - Introduction of Byte Queue Limits as suggested by Stephen Hemminger > >>>>> > >>>>> This seems to be not easy. The tx completion depends on the userspace > >>>>> behaviour. > >>>> > >>>> I agree, but I really would like to reduce the buffer bloat caused by the > >>>> default 500 TUN / 1000 TAP packet queue without losing performance. > >>>> > >>>>> > >>>>>> - Adaption of the netdev queue flow control for ipvtap & macvtap > >>>>>> > >>>>>> [1] Link: > >>>>>> https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device > >>>>>> [2] Link: > >>>>>> https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf > >>>>>> > >>>>> > >>>>> Thanks > >>>>> > >>>> > >>>> Thanks! :) > >>>> > >>> > >> > > >

