On Mon, Jun 22, 2026 at 08:27:12PM +0800, Menglong Dong wrote:
> On 2026/6/22 06:31 Michael S. Tsirkin <[email protected]> write:
> > On Tue, Jun 16, 2026 at 07:59:12PM +0800, Menglong Dong wrote:
> [...]
> > >
> > > + vring_size = virtqueue_get_vring_size(sq->vq);
> > > + need_wakeup = xsk_uses_need_wakeup(pool);
> > > +
> > > + if (need_wakeup && vring_size == sq->vq->num_free)
> > > + xsk_set_tx_need_wakeup(pool);
> > > +
> >
> > why are we doing this here?
> > the check after virtnet_xsk_xmit_batch not enough?
> > I vaguely think it's some kind of race we are closing?
> > Pls add a comment to explain.
>
> Hi, Michael. Thanks for your review.
>
> Yeah, it's for a race condition between user space and kernel
> space. I added a comment in V2, which is too confusing, and
> I removed it 😢. I'll make it more clear and add it in the V4. The
> origin comment is:
>
> * If the sq->vq is empty, and the tx ring is empty, and the user
> * submit an entry to the tx ring after virtnet_xsk_xmit_batch() and
> * before xsk_set_tx_need_wakeup(), we will lose the chance to wake
> * up the tx napi, so we have to set the need_wakeup flag here.
>
> And the logic is like this:
>
> Kernel: tx NAPI is waked up from skb_xmit_done() ->
> Kernel: sq->vq and xsk->tx_ring are both empty ->
> Kernel: call virtnet_xsk_xmit_batch()
>
> User: submit a entry to the xsk->tx_ring
> User: check the wakeup flag
> User: wakeup flag is not set, skip send()
>
> Kernel: call xsk_set_tx_need_wakeup(), because sq->vq is empty
>
> If we don't send more data, the data in the xsk->tx_ring will
> not be sent forever.
I'm not 100% sure I understand, but when someone fixes cross-CPU races
with no synchronization or CPU memory barriers just with extra checks,
this always gives me pause.
AI helped write this for me, for example:
1. Kernel: xsk_set_tx_need_wakeup stores NEED_WAKEUP (sits in store buffer)
2. Kernel: xsk_tx_peek_release_desc_batch - load, sees empty (reordered
before the store is globally visible)
3. Kernel: peek finds nothing, returns 0
4. Userspace: stores entry + producer
5. Userspace: loads flags - doesn't see NEED_WAKEUP yet (still in kernel's
store buffer)
6. Userspeace: skips send()
7. Kernel: NEED_WAKEUP store finally becomes visible - too late
Seems legit?
> >
> > > sent = virtnet_xsk_xmit_batch(sq, pool, budget, &kicks);
> > >
> > > + if (need_wakeup) {
> > > + if (vring_size == sq->vq->num_free)
> > > + /* we can't wake up by ourself, and it should be done
> > > + * by the user.
> > > + */
> > > + xsk_set_tx_need_wakeup(pool);
> > > + else
> > > + /* we can wake up from skb_xmit_done() */
> > > + xsk_clear_tx_need_wakeup(pool);
> >
> > But what if we don't have get tx napi so no wakeup in skb_xmit_done?
>
> Sorry that I'm not sure what "get tx napi" means here ;(
>
> There are entry in sq->vq, so skb_xmit_done() will be called after
> the entries in the ring is consumed by the HOST, right?
> Then, the corresponding sq->napi will be scheduled, as we ensure
> that tx napi is always enabled, which means napi->weight is not
> zero, in this commit:
> 1df5116a41a8 ("virtio_net: xsk: prevent disable tx napi")
Oh I forgot we did that. But can xsk bind when tx napi has already
been disabled previously?
> Right?
>
> Thanks!
> Menglong Dong
>
> >
> >
> > > + }
> > > +
> > > if (!is_xdp_raw_buffer_queue(vi, sq - vi->sq))
> > > check_sq_full_and_disable(vi, vi->dev, sq);
> > >
> > > @@ -1470,9 +1488,6 @@ static bool virtnet_xsk_xmit(struct send_queue *sq,
> > > struct xsk_buff_pool *pool,
> > > u64_stats_add(&sq->stats.xdp_tx, sent);
> > > u64_stats_update_end(&sq->stats.syncp);
> > >
> > > - if (xsk_uses_need_wakeup(pool))
> > > - xsk_set_tx_need_wakeup(pool);
> > > -
> > > return sent;
> > > }
> > >
> > > --
> > > 2.54.0
> >
> >
> >
>
>
>