On Tue, Jun 09, 2026 at 08:07:14PM +0800, Li Chen wrote: > Hi, Hi Li Chen,
In case you missed it, a Sashiko AI review of this set has posted feedback. Please take a look. https://sashiko.dev/#/patchset/20260609120726.1714780-1-me%40linux.beauty -- Alison > > The nvdimm flush helper currently converts any non-zero provider flush > callback error to -EIO. That hides useful errno values from providers. For > example, virtio-pmem may fail child flush bio allocation with -ENOMEM, but > that is currently reported as -EIO by nvdimm_flush(). > > The raw failure seen in the local mkfs sanity test was: > > wipefs: /dev/pmem0: cannot flush modified buffers: Input/output error > mkfs.ext4: Input/output error while writing out and closing file system > nd_region region0: dbg: nvdimm_flush rc=-5 > > The first two patches keep provider flush errors intact and make the > virtio-pmem child flush bio allocation use GFP_NOIO. async_pmem_flush() can > allocate a child flush bio from filesystem flush and writeback paths; > GFP_NOIO is a better fit than GFP_ATOMIC there because the allocation may > sleep but must not recurse into filesystem I/O reclaim. With these changes, > the same mkfs sanity test reached mkfs_rc=0, mount_rc=0, and umount_rc=0. > > The rest of the series addresses virtio-pmem request lifetime and broken > virtqueue handling. The virtio-pmem flush path uses a virtqueue cookie/token > to carry a per-request context through completion. Under broken virtqueue / > notify failure conditions, the submitter can return and free the request > object while the host/backend may still complete the published request. The > IRQ completion handler then dereferences freed memory when waking waiters, > which is reported by KASAN as a slab-use-after-free and may manifest as lock > corruption (e.g. "BUG: spinlock already unlocked") without KASAN. > > In addition, the flush path has two wait sites: one for virtqueue descriptor > availability (-ENOSPC from virtqueue_add_sgs()) and one for request > completion. If the virtqueue becomes broken, forward progress is no longer > guaranteed and these waiters may sleep indefinitely unless the driver > converges the failure and wakes all wait sites. > > This series addresses these issues: > > 1/7 nvdimm: preserve flush callback errors > Return provider flush callback errors directly from nvdimm_flush(). > > 2/7 nvdimm: virtio_pmem: use GFP_NOIO for child flush bio > Use GFP_NOIO for the child flush bio allocation. > > 3/7 nvdimm: virtio_pmem: always wake -ENOSPC waiters > Wake one -ENOSPC waiter for each reclaimed used buffer, decoupled from > token completion. > > 4/7 nvdimm: virtio_pmem: use READ_ONCE()/WRITE_ONCE() for wait flags > Use READ_ONCE()/WRITE_ONCE() for the wait_event() flags (done and > wq_buf_avail). > > 5/7 nvdimm: virtio_pmem: refcount requests for token lifetime > Refcount request objects so the token lifetime spans the window where it is > reachable through the virtqueue until completion/drain drops the virtqueue > reference. > > 6/7 nvdimm: virtio_pmem: converge broken virtqueue to -EIO > Track a device-level broken state to converge broken/notify failures to -EIO: > wake all waiters and drain/detach outstanding requests to complete them with > an error, and fail-fast new requests. > > 7/7 nvdimm: virtio_pmem: drain requests in freeze > Drain outstanding requests in freeze() before tearing down virtqueues so > waiters do not sleep indefinitely. > > Testing was done on QEMU x86_64 with a virtio-pmem device exported as > /dev/pmem0. This v4 series applies to v7.1-rc7, builds with > CONFIG_VIRTIO_PMEM=m, passes checkpatch, and passed the local repro checks > with a local-only virtqueue_kick() fault injection. I also checked that it > applies cleanly to next/master at 6e845bcb78c9 ("Add linux-next specific > files for 20260605"). > > Thanks, > Li Chen > > Changelog: > v3->v4: > - Rebased the series onto v7.1-rc7 so it applies cleanly to Linux 7.1-rc7. > - Update the allocation site in 6/7 from kmalloc(sizeof(*req_data), > GFP_KERNEL) to kmalloc_obj(*req_data) to match current nvdimm code. > - Add 1/7 to preserve provider flush callback errors in nvdimm_flush(). > - Include the GFP_NOIO child flush bio allocation fix as 2/7. > - Renumber the old request lifetime and broken virtqueue fixes after the two > new flush error patches. > v2->v3: > - Split patch 1 as suggested by Pankaj Gupta: keep the waiter wakeup > ordering change in 1/5 and move READ_ONCE()/WRITE_ONCE() updates to > 2/5 (no functional change intended). > - Add log report to commit msg > - Fold the export fix into 4/5 to keep the series bisectable when > CONFIG_VIRTIO_PMEM=m. > v1->v2: add the export patch to fix compile issue. > > Links: > v3: https://lore.kernel.org/all/[email protected]/#t > v2: https://lore.kernel.org/all/[email protected]/ > v1: https://www.spinics.net/lists/kernel/msg5974818.html > > Li Chen (7): > nvdimm: preserve flush callback errors > nvdimm: virtio_pmem: use GFP_NOIO for child flush bio > nvdimm: virtio_pmem: always wake -ENOSPC waiters > nvdimm: virtio_pmem: use READ_ONCE()/WRITE_ONCE() for wait flags > nvdimm: virtio_pmem: refcount requests for token lifetime > nvdimm: virtio_pmem: converge broken virtqueue to -EIO > nvdimm: virtio_pmem: drain requests in freeze > > drivers/nvdimm/nd_virtio.c | 139 +++++++++++++++++++++++++++++------ > drivers/nvdimm/region_devs.c | 6 +- > drivers/nvdimm/virtio_pmem.c | 14 ++++ > drivers/nvdimm/virtio_pmem.h | 6 ++ > 4 files changed, 139 insertions(+), 26 deletions(-) > -- > 2.52.0

