On Sun, May 10, 2026 at 01:33:18PM -0700, Zhu Yanjun wrote:
> 在 2026/5/7 19:27, Bobby Eshleman 写道:
> > This series enables TCP devmem TX through netkit devices.
> >
> > Netkit now supports queue leasing. A physical NIC's RX queue can be
> > leased to a netkit guest interface inside a container namespace. This
> > gives the container a devmem-capable data path on the RX side (bind-rx,
> > etc...). On the TX side, the container process binds to its netkit guest
> > interface and sends traffic that netkit redirects (via BPF or ip
> > forwarding) to the physical NIC for DMA.
> >
> > Two things in the existing devmem TX path prevent this from working:
> >
> > 1. validate_xmit_unreadable_skb() requires dev->netmem_tx before it will
> > forward a dmabuf-backed (unreadable) skb. This protects skbs from
> > landing on devices that don't have the IOMMU mappings for the backing
> > dmabuf or that don't speak netmem. Netkit, however, does not support
> > DMA, doesn't attempt to read unreadable skb pages and so doesn't
> > break netmem (it is pure skb routing and redirection). It is
> > functionally capable of routing unreadable skbs, but there is no way
> > for the TX validation pathway to distinguish between a device that
> > will actually attempt DMA-ing the skb and another device
> > (like netkit) that does not DMA but also does not break
> > netmem.
> >
> > 2. bind_tx_doit uses the bound device as the DMA device. When the user
> > binds devmem TX to the netkit guest, the bind handler attempts to
> > create DMA mappings against netkit, which has no DMA capability and
> > no IOMMU mappings.
> >
> > This series solves these problems as follows:
> >
> > 1. Extend netmem_tx to two bits, assigned to one of three values:
> >
> > NETMEM_TX_NONE - netmem not supported
> > NETMEM_TX_DMA - netmem supported and performs DMA
> > NETMEM_TX_NO_DMA - netmem supported, but does not DMA
> >
> > With these bits, phys devices can set NETMEM_TX_DMA and devices like
> > netkit set NETMEM_TX_NO_DMA. The validation TX path ensures that any
> > DMA-capable netdev exactly matches the bound device, guaranteeing the
> > correct mapping of the bound dmabuf. The validation TX path also
> > allows devices with NETMEM_TX_NO_DMA to pass, knowing these devices
> > will not misuse netmem or run into IOMMU faults. After redirection or
> > routing and the skb finally makes its way through the stack to a
> > physical device's TX path, the above NETMEM_TX_DMA check is performed
> > again to guarantee the device has the appropriate binding/mappings.
> >
> > 2. On TX bind, the bind handler recognizes NETMEM_TX_NO_DMA devices and
> > finds the phys TX device and binds to that instead. For the netkit
> > case, if it has been leased a queue from a DMA-capable device
> > already, then the bind action is performed on the DMA-capable device
> > instead and the dmabuf is mapped correctly.
> >
> > ---
> > Changes in v3:
> > - Fix validate_xmit_unreadable_skb() logic for non-devmem
> > unreadable niovs (should not be dropped) (Sashiko)
> > - Simplify lock handling in bind_tx, no premature release (Jakub)
> > - split NO_DMA changes into separate patch (Jakub)
> > - fixed some pylint issues, one required an additional patch ("selftests:
> > drv-net: make attr _nk_guest_ifname public") to rename a variable from
> > private to public
> > - see per-patch changelist for more detailed changes
> > - Link to v2:
> > https://lore.kernel.org/r/[email protected]
> >
> > Changes in v2:
> > - Squash driver conversion patches (2-5) into patch 1 (Jakub)
> > - In validate_xmit_unreadable_skb() to check netmem_tx mode before
> > inspecting
> > frags (Jakub)
> > - Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev != netdev to
> > fix lockdep (Sashiko)
> > - Move require_devmem() into individual test functions so KsftSkipEx goes
> > up to
> > ksft_run() (Sashiko)
> > - Add nk_devmem.py to TEST_PROGS in Makefile (Sashiko)
> > - Link to v1:
> >
> > https://lore.kernel.org/all/[email protected]/
> >
> > Signed-off-by: Bobby Eshleman <[email protected]>
> >
> > ---
> > Bobby Eshleman (8):
> > net: convert netmem_tx flag to enum
> > net: netkit: declare NETMEM_TX_NO_DMA mode
> > net: devmem: support TX over NETMEM_TX_NO_DMA devices
>
> I applied this patchset in my local kernel tree and built a new kernel
> image. I loaded this new kernel image in my test environment. It seems that
> all the testcases can pass.
>
> I think that this patchset would not cause any regression problem in my test
> environment.
>
> Zhu Yanjun
Thanks for testing!
Best,
Bobby