Hi Stephen, Answers inline. I have also sent out patch-set v16 incorporating the fixes from your AI review.
Internal Use - Confidential > -----Original Message----- > From: Stephen Hemminger <[email protected]> > Sent: Friday, June 5, 2026 9:46 AM > To: Bathija, Pravin <[email protected]> > Cc: [email protected]; [email protected]; > [email protected]; [email protected] > Subject: Re: [PATCH v15 0/5] Support add/remove memory region and get-max- > slots > > > [EXTERNAL EMAIL] > > On Thu, 4 Jun 2026 23:57:18 +0000 > <[email protected]> wrote: > > > From: Pravin M Bathija <[email protected]> > > > > This is version v15 of the patchset and it incorporates the > > recommendations made by Maxime Coquelin. > > > > Patch 4/5 > > - Changed VHOST_USER_REM_MEM_REG handler declaration from > > accepts_fd=true to accepts_fd=false, as the remove request does not > > expect FDs in ancillary data. > > - Removed all close_msg_fds(ctx) calls from vhost_user_rem_mem_reg(), no > > longer needed since the handler is declared as not accepting FDs. > > - Removed validate_msg_fds(dev, ctx, 0) check from > > vhost_user_rem_mem_reg(), as FD validation is now handled generically > > by the framework. > > - Added targeted IOTLB cache invalidation in vhost_user_rem_mem_reg() > > using vhost_user_iotlb_cache_remove() for the removed region's GPA > > range, instead of the nuclear iotlb_flush_all() used by set_mem_table. > > > > This implementation has been extensively tested by doing Read/Write > > I/O from multiple instances of fio + libblkio (front-end) talking to > > spdk/dpdk (back-end) based drives. Tested with qemu front-end talking > > to dpdk testpmd (back-end) performing add/removal of memory regions. > > Also tested post-copy live migration after doing add_memory_region. > > > > Version Log: > > Version v15 (Current version): Incorporate code review suggestions > > from Maxime Coquelin as described above. > > > > Version v14: Incorporate code review suggestions from Stephen > > Hemminger and Fengcheng Wen. > > Changes from Fengcheng Wen review: > > Patch 3/5 > > - Moved free_all_mem_regions() call sites in vhost_user_set_mem_table() > > from patch 4/5 to patch 3/5 so each commit compiles independently > > Patch 4/5 > > - Renamed _dev_invalidate_vrings() to vhost_user_invalidate_vrings() to > > follow vhost naming convention > > - Added comment explaining *pdev propagation through > > translate_ring_addresses / numa_realloc() > > - Reordered local variables in vhost_user_add_mem_reg() and > > vhost_user_rem_mem_reg() by descending line length > > - Shortened overlap check variable names (current_region_guest_start/end > > --> cur_start/end, proposed_region_guest_start/end -> new_start/end) > > - Fixed DMA error path in vhost_user_add_mem_reg(): added > > free_new_region_no_dma label so async_dma_map_region(false) is not > > called when the map itself failed. > > Changes from Stephen Hemminger review: > > Patch 4/5 > > - vhost_user_add_mem_reg() now constructs a reply with the back-end's > > host mapping address in userspace_addr and returns > > RTE_VHOST_MSG_RESULT_REPLY per the vhost-user spec > > - Added validate_msg_fds(dev, ctx, 0) in vhost_user_rem_mem_reg() to > > reject malformed messages with unexpected file descriptors > > - Dropped unnecessary (uint64_t) cast in > > vhost_user_get_max_mem_slots() > > > > Version v13: Incorporate code review suggestions from Fengcheng Wen > > Patch 2/5 Renamed VhostUserSingleMemReg to VhostUserMemRegMsg and > > memory_single to memreg Patches 3/5 and 4/5 Relocated function > > remove_guest_pages from patch 3/5 to 4/5 > > > > Version v12: Incorporate code review suggestions from Maxime Coquelin > > and ai-code-review. > > Patch 3/5 > > Refactored async_dma_map() to delegate to async_dma_map_region(), > > eliminating code duplication between the two functions. > > Restored original comments in async_dma_map_region() explaining why > > ENODEV and EINVAL errors are ignored (these were stripped in v10) > > Reverted unnecessary changes to vhost_user_postcopy_register() -- > > removed the host_user_addr == 0 checks and reg_msg_index indirection > > that were added in v10, since this function is only called from > > vhost_user_set_mem_table() where regions are always contiguous. > > > > Version v11: Incorporate code review suggestions from Stephen Hemminger. > > Patch 4/5 > > Fix incomplete cleanup in vhost_user_add_mem_reg() when > > vhost_user_mmap_region() fails after the mmap succeeds (e.g. > > add_guest_pages() realloc failure) realloc failure). The error path > > now calls remove_guest_pages() and free_mem_region() to undo the > > mapping and stale guest-page entries, preventing a leaked mmap and > > slot reuse corruption. The plain close(fd) path is kept for pre-mmap > > failures. > > > > Version v10: Incorporate code review suggestions from Stephen Hemminger. > > Patch 4/5 > > Moved dev_invalidate_vrings after free_mem_region, array compaction, > > and nregions decrement. This ensures translate_ring_addresses only > > sees surviving memory regions, preventing vring pointers from > > resolving into a region that is about to be unmapped. > > > > Version v9: Incorporate code review suggestions from Stephen Hemminger. > > Patch 3/5 > > Restored max_guest_pages initial value to hardcoded 8 instead of > > VHOST_MEMORY_MAX_NREGIONS, matching upstream semantics. > > Patch 4/5 > > Added close(reg->fd) and reg->fd = -1 before goto close_msg_fds in the > > mmap failure path to fix fd leak after fd was moved from ctx->fds[0]. > > Converted dev_invalidate_vrings from a plain function to a macro + > > implementation function pair, accepting message ID as a parameter so > > the static_assert reports the correct handler at each call site. > > Updated dev_invalidate_vrings call in add_mem_reg to pass > > VHOST_USER_ADD_MEM_REG as message ID. > > Updated dev_invalidate_vrings call in rem_mem_reg to pass > > VHOST_USER_REM_MEM_REG as message ID. > > > > Version v8: Incorporate code review suggestions from Stephen Hemminger. > > rewrite async_dma_map_region function to iterate guest pages by host > > address range matching change function dev_invalidate_vrings to accept > > a double pointer to propagate pointer updates new function > > remove_guest_pages was added add_mem_reg error path was narrowed to > > only clean up the single failed region instead of destroting all > > existing regions > > > > Version v7: Incorporate code review suggestions from Maxime Coquelin. > > Add debug messages to vhost_postcopy_register function. > > > > Version v6: Added the enablement of this feature as a final patch in > > this patch-set and other code optimizations as suggested by Maxime > > Coquelin. > > > > Version v5: removed the patch that increased the number of memory > > regions from 8 to 128. This will be submitted as a separate feature at > > a later point after incorporating additional optimizations. Also > > includes code optimizations as suggested by Feng Cheng Wen. > > > > Version v4: code optimizations as suggested by Feng Cheng Wen. > > > > Version v3: code optimizations as suggested by Maxime Coquelin and > > Thomas Monjalon. > > > > Version v2: code optimizations as suggested by Maxime Coquelin. > > > > Version v1: Initial patch set. > > > > Pravin M Bathija (5): > > vhost: add user to mailmap and define to vhost hdr > > vhost: header defines for add/rem mem region > > vhost: refactor memory helper functions > > vhost: add mem region add/remove handlers > > vhost: enable configure memory slots > > > > .mailmap | 1 + > > lib/vhost/rte_vhost.h | 4 + > > lib/vhost/vhost_user.c | 425 > > +++++++++++++++++++++++++++++++++++------ > > lib/vhost/vhost_user.h | 10 + > > 4 files changed, 378 insertions(+), 62 deletions(-) > > > > I don't think this is ready to merge based on AI review. > Did AI review with Opus 4.8 on a chat which has past context. > > Summary of v15 findings > > > New in v15 (both patch 4/5, both errors): > > Use-after-free on the reply path: reg points into dev->mem->regions[], but > dev_invalidate_vrings() -> translate_ring_addresses() -> numa_realloc() can > relocate dev->mem. dev is refreshed via *pdev, reg is not, then reg- > >host_user_addr is read for the reply. Re-derive reg (or capture > host_user_addr) after dev = *pdev. Confirmed. reg points into dev->mem->regions[], and dev_invalidate_vrings() -> translate_ring_addresses() -> numa_realloc() can reallocate dev->mem, leaving reg dangling before reg->host_user_addr is read. Fixed by re-deriving the region from the refreshed dev (reg = &dev->mem->regions[dev->mem->nregions - 1]) after dev = *pdev, before building the reply. > ADD_MEM_REG reply sent unconditionally: handler always returns > RESULT_REPLY, but the spec makes the mapping-address reply postcopy- only. > In non-postcopy mode this desyncs the channel (no REPLY_ACK: the front-end > never reads it; with REPLY_ACK: it expects a u64 ack, not a memreg). Gate the > reply on dev->postcopy_listening, else return RESULT_OK -- same as > SET_MEM_TABLE. Confirmed against the vhost-user spec, which states the mapping-address reply is postcopy-only ("In postcopy mode... the back-end replies with the bases of the memory mapped region"). The reply is now gated on dev->postcopy_listening: return RTE_VHOST_MSG_RESULT_REPLY in postcopy mode, and RTE_VHOST_MSG_RESULT_OK otherwise, matching VHOST_USER_SET_MEM_TABLE. This avoids desyncing the channel in non-postcopy mode. > > Carried over from v13 (now in a different form): > > The v13 Warning (missing postcopy mapping-address reply) is addressed but > mis-gated; correct fix is the conditional reply above. Until then postcopy > correctness still isn't right. Same fix as above.

