On Tue, Sep 16, 2025 at 12:03 AM Michael S. Tsirkin <m...@redhat.com> wrote: > > On Fri, Sep 12, 2025 at 04:26:58PM +0800, Jason Wang wrote: > > Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after > > sendmsg") tries to defer the notification enabling by moving the logic > > out of the loop after the vhost_tx_batch() when nothing new is > > spotted. This will bring side effects as the new logic would be reused > > for several other error conditions. > > > > One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs() > > might return -EAGAIN and exit the loop and see there's still available > > buffers, so it will queue the tx work again until userspace feed the > > IOTLB entry correctly. This will slowdown the tx processing and may > > trigger the TX watchdog in the guest. > > > > Fixing this by stick the notificaiton enabling logic inside the loop > > when nothing new is spotted and flush the batched before. > > > > Reported-by: Jon Kohler <j...@nutanix.com> > > Cc: sta...@vger.kernel.org > > Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after > > sendmsg") > > Signed-off-by: Jason Wang <jasow...@redhat.com> > > --- > > drivers/vhost/net.c | 33 +++++++++++++-------------------- > > 1 file changed, 13 insertions(+), 20 deletions(-) > > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > > index 16e39f3ab956..3611b7537932 100644 > > --- a/drivers/vhost/net.c > > +++ b/drivers/vhost/net.c > > @@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, > > struct socket *sock) > > int err; > > int sent_pkts = 0; > > bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX); > > - bool busyloop_intr; > > bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER); > > > > do { > > - busyloop_intr = false; > > + bool busyloop_intr = false; > > + > > if (nvq->done_idx == VHOST_NET_BATCH) > > vhost_tx_batch(net, nvq, sock, &msg); > > > > @@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, > > struct socket *sock) > > break; > > /* Nothing new? Wait for eventfd to tell us they refilled. */ > > if (head == vq->num) { > > - /* Kicks are disabled at this point, break loop and > > - * process any remaining batched packets. Queue will > > - * be re-enabled afterwards. > > + /* Flush batched packets before enabling > > + * virqtueue notification to reduce > > + * unnecssary virtqueue kicks. > > */ > > + vhost_tx_batch(net, nvq, sock, &msg); > > So why don't we do this in the "else" branch"? If we are busy polling > then we are not enabling kicks, so is there a reason to flush?
It should be functional equivalent: do { if (head == vq->num) { vhost_tx_batch(); if (unlikely(busyloop_intr)) { vhost_poll_queue() } else if () { vhost_disable_notify(&net->dev, vq); continue; } return; } vs do { if (head == vq->num) { if (unlikely(busyloop_intr)) { vhost_poll_queue() } else if () { vhost_tx_batch(); vhost_disable_notify(&net->dev, vq); continue; } break; } vhost_tx_batch(); return; Thanks > > > > + if (unlikely(busyloop_intr)) { > > + vhost_poll_queue(&vq->poll); > > + } else if (unlikely(vhost_enable_notify(&net->dev, > > + vq))) { > > + vhost_disable_notify(&net->dev, vq); > > + continue; > > + } > > break; > > } > > > > @@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, > > struct socket *sock) > > ++nvq->done_idx; > > } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len))); > > > > - /* Kicks are still disabled, dispatch any remaining batched msgs. */ > > vhost_tx_batch(net, nvq, sock, &msg); > > - > > - if (unlikely(busyloop_intr)) > > - /* If interrupted while doing busy polling, requeue the > > - * handler to be fair handle_rx as well as other tasks > > - * waiting on cpu. > > - */ > > - vhost_poll_queue(&vq->poll); > > - else > > - /* All of our work has been completed; however, before > > - * leaving the TX handler, do one last check for work, > > - * and requeue handler if necessary. If there is no work, > > - * queue will be reenabled. > > - */ > > - vhost_net_busy_poll_try_queue(net, vq); > > } > > > > static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) > > -- > > 2.34.1 >