From: Jason Wang <jasow...@redhat.com> [ Upstream commit b4ba1207d45adaafa2982c035898b36af2d3e518 ]
This patch fails vhost_add_used_n() early when __vhost_add_used() fails to make sure used idx is not updated with stale used ring information. Reported-by: Eugenio Pérez <epere...@redhat.com> Signed-off-by: Jason Wang <jasow...@redhat.com> Message-Id: <20250714084755.11921-2-jasow...@redhat.com> Signed-off-by: Michael S. Tsirkin <m...@redhat.com> Tested-by: Lei Yang <leiy...@redhat.com> Signed-off-by: Sasha Levin <sas...@kernel.org> --- LLM Generated explanations, may be completely bogus: **Backport Status: YES** This commit should be backported to stable kernel trees. Here's my extensive analysis: ## Bug Analysis The commit fixes a critical error handling bug in `vhost_add_used_n()` where the function continues execution even after `__vhost_add_used_n()` fails on line 2776. Looking at the code: 1. **The Bug**: In the original code at drivers/vhost/vhost.c:2776, when the second call to `__vhost_add_used_n()` fails and returns an error, the function doesn't immediately return. Instead, it continues to: - Execute memory barrier (`smp_wmb()`) at line 2779 - Update the used index via `vhost_put_used_idx()` at line 2780 - Potentially log the used index update at lines 2784-2791 2. **Impact**: This means the vhost driver updates the used ring index even when the actual used ring entries weren't successfully written. This creates a **data corruption scenario** where: - The guest sees an updated used index - But the corresponding used ring entries contain stale/invalid data - This can lead to guest crashes, data corruption, or unpredictable behavior 3. **The Fix**: The patch adds a simple but crucial check at lines 2778-2779 (after applying): ```c if (r < 0) return r; ``` This ensures the function returns immediately upon failure, preventing the index from being updated with invalid ring state. ## Stable Backport Criteria Assessment 1. **Bug Fix**: ✓ This fixes a real bug that can cause data corruption in vhost operations 2. **Small and Contained**: ✓ The fix is only 3 lines of code - extremely minimal 3. **No Side Effects**: ✓ The change only adds proper error handling, no behavioral changes for success cases 4. **No Architectural Changes**: ✓ Simple error check addition, no design changes 5. **Critical Subsystem**: ✓ vhost is used for virtualization (virtio devices), affecting VMs and containers 6. **Clear Bug Impact**: ✓ Data corruption in guest-host communication is a serious issue 7. **Follows Stable Rules**: ✓ Important bugfix with minimal regression risk ## Additional Evidence - The bug was reported by Eugenio Pérez from Red Hat, indicating it was found in production/testing environments - The fix has been tested (as indicated by "Tested-by: Lei Yang") - The function `__vhost_add_used_n()` can fail with -EFAULT when `vhost_put_used()` fails (line 2738-2740) - The first call to `__vhost_add_used_n()` already has proper error handling (lines 2770-2772), making this an inconsistency bug This is a textbook example of a stable backport candidate: a small, obvious fix for a real bug that can cause data corruption in a critical kernel subsystem. drivers/vhost/vhost.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 3a5ebb973dba..d1d3912f4804 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2775,6 +2775,9 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads, } r = __vhost_add_used_n(vq, heads, count); + if (r < 0) + return r; + /* Make sure buffer is written before we update index. */ smp_wmb(); if (vhost_put_used_idx(vq)) { -- 2.39.5