Hi, This series started as a virtio-pmem request lifetime and broken virtqueue fix, but the rerolls have picked up several related flush-path fixes found during local testing and review. Since the series is now broader than the original lifetime bug, this cover letter calls out where the patches came from.
The nvdimm flush helper maps provider flush failures to -EIO. That should remain the default for provider/backend failures because host-side errors are still best reported as generic I/O errors to the guest. However, virtio-pmem may also fail a guest-local flush request allocation with -ENOMEM before any request is submitted to the host. Reporting that resource failure as -EIO makes memory pressure look like media failure. The raw failure seen in the local mkfs sanity test was: wipefs: /dev/pmem0: cannot flush modified buffers: Input/output error mkfs.ext4: Input/output error while writing out and closing file system nd_region region0: dbg: nvdimm_flush rc=-5 Patch 1 comes from that local failure, with the error policy narrowed after Pankaj pointed out that host/backend provider errors should not all be exposed directly to the guest. It now preserves only -ENOMEM and keeps other provider flush failures mapped to -EIO. Patches 2 and 3 come from review of the pmem flush path. Patch 2 keeps a failed REQ_PREFLUSH from being overwritten after data copy, and patch 3 is the dataless-bio guard added after the Sashiko review. Patch 4 comes from the local child flush bio allocation failure, but v7 reworks the v6 synchronous FUA approach after Pankaj noted that the old child flush bio path completed asynchronously. This version removes the child bio while keeping parent bio completion asynchronous: the provider returns NVDIMM_FLUSH_ASYNC, queues ordered WQ_MEM_RECLAIM work, and completes the parent bio after virtio_pmem_flush() finishes. Patch 5 is the remaining allocation-policy follow-up for the actual virtio-pmem flush request object, not for a child bio. Patches 6 and 7 are the older waiter fixes. Patch 6 wakes one -ENOSPC waiter for each reclaimed used buffer, and patch 7 makes the wait flags explicit READ_ONCE()/WRITE_ONCE() accesses. Pankaj asked for those changes to be split across patches, and patch 7 carries his Acked-by. Patch 8 is the original KASAN use-after-free fix for the request token lifetime. Patches 9 and 10 are follow-up hardening in the same completion path: order response publication before the submitter reads resp.ret, and keep the DMA_FROM_DEVICE response buffer away from CPU-owned request fields. Patch 11 addresses the broken virtqueue / notify failure path reported by LKP and reproduced locally with fault injection. It also serializes async parent-bio flush work against broken-state publication, so remove/freeze cannot drain the workqueue before a racing FUA bio queues new completion work. Patch 12 handles teardown: it drains requests across freeze/remove and also addresses the Sashiko-reported req_vq-after-free/NULL-deref class by clearing req_vq after del_vqs() and making the drain helper tolerate a NULL queue. It also stops the submit path from checking req_vq after the broken state is visible. The original repros were on QEMU x86_64 with a virtio-pmem device exported as /dev/pmem0. For this v7 reroll, the series applies to v7.1-rc7. Thanks, Li Chen Changelog: v6->v7: - Address Pankaj's feedback on nvdimm_flush() error policy. - Preserve only -ENOMEM from provider flush callbacks and continue to map other provider/backend failures to -EIO. - Address Pankaj's feedback on the FUA flush behavior: replace the v6 synchronous FUA path with provider-owned asynchronous parent bio completion. - Add NVDIMM_FLUSH_ASYNC and use ordered WQ_MEM_RECLAIM work to run virtio_pmem_flush() and complete the parent bio after the host flush. - Keep GFP_NOIO for the virtio-pmem request allocation, but no longer describe it as a child bio allocation fix. - Add Pankaj's Acked-by on the READ_ONCE()/WRITE_ONCE() patch. - Serialize async parent-bio flush work against broken-state publication in the broken-virtqueue patch, so remove/freeze cannot drain the workqueue before a racing FUA bio queues new completion work. - Fold the Sashiko-reported req_vq NULL-deref fix into the freeze/remove drain patch. - Update commit messages and this cover letter to describe patch origins. v5->v6: - Address Sashiko review feedback: - Add a data-loop guard for dataless bios in pmem_submit_bio(). - Replace the child flush bio allocation with synchronous FUA flushing. - Keep GFP_NOIO only for the virtio-pmem request allocation. - Publish request completion with release/acquire ordering. - Isolate the DMA_FROM_DEVICE response buffer from CPU-owned fields. - Wake the in-flight host-completion waiter when marking the queue broken. - Clear req_vq after del_vqs() and make drain tolerate a NULL queue. v4->v5: - Address review feedback about REQ_PREFLUSH ordering and active virtqueue detach. - Add 2/8 so a failed REQ_PREFLUSH fails the bio before any data copy, and make REQ_PREFLUSH use a synchronous provider flush instead of a deferred child bio. - Rework broken-queue handling so runtime failure marking only stops new submissions and wakes local -ENOSPC waiters; used/unused token draining is done after device reset in remove() and freeze(). - Remove the broken-state shortcut from the host-completion wait so the submitter never reads an uninitialized response field. - Keep the raw broken-virtqueue dmesg in 7/8 while updating the teardown rationale. - Renumber the old virtio-pmem fixes after the new pmem PREFLUSH patch. v3->v4: - Rebased the series onto v7.1-rc7 so it applies cleanly to Linux 7.1-rc7. - Update the allocation site in 6/7 from kmalloc(sizeof(*req_data), GFP_KERNEL) to kmalloc_obj(*req_data) to match current nvdimm code. - Add 1/7 to preserve provider flush callback errors in nvdimm_flush(). - Include the GFP_NOIO child flush bio allocation fix as 2/7. - Renumber the old request lifetime and broken virtqueue fixes after the two new flush error patches. v2->v3: - Split patch 1 as suggested by Pankaj Gupta: keep the waiter wakeup ordering change in 1/5 and move READ_ONCE()/WRITE_ONCE() updates to 2/5 (no functional change intended). - Add log report to commit msg. - Fold the export fix into 4/5 to keep the series bisectable when CONFIG_VIRTIO_PMEM=m. v1->v2: - Add the export patch to fix compile issue. Links: v6: https://lore.kernel.org/all/[email protected]/ v5: https://lore.kernel.org/all/[email protected]/ v4: https://lore.kernel.org/all/[email protected]/ v3: https://lore.kernel.org/all/[email protected]/#t v2: https://lore.kernel.org/all/[email protected]/ v1: https://www.spinics.net/lists/kernel/msg5974818.html Li Chen (12): nvdimm: preserve flush callback -ENOMEM nvdimm: pmem: keep PREFLUSH before data writes nvdimm: pmem: guard data loop for dataless bios nvdimm: virtio_pmem: stop allocating child flush bio nvdimm: virtio_pmem: use GFP_NOIO for flush requests nvdimm: virtio_pmem: always wake -ENOSPC waiters nvdimm: virtio_pmem: use READ_ONCE()/WRITE_ONCE() for wait flags nvdimm: virtio_pmem: refcount requests for token lifetime nvdimm: virtio_pmem: publish done with release/acquire nvdimm: virtio_pmem: isolate DMA request buffers nvdimm: virtio_pmem: converge broken virtqueue to -EIO nvdimm: virtio_pmem: drain requests in freeze drivers/nvdimm/nd_virtio.c | 265 +++++++++++++++++++++++++++++------ drivers/nvdimm/pmem.c | 51 ++++--- drivers/nvdimm/region_devs.c | 5 +- drivers/nvdimm/virtio_pmem.c | 65 ++++++++- drivers/nvdimm/virtio_pmem.h | 22 ++- include/linux/libnvdimm.h | 9 ++ 6 files changed, 343 insertions(+), 74 deletions(-) -- 2.52.0

