vsock: suppress EHOSTUNREACH fast-fail during CPR pause

Stefano Garzarella Tue, 16 Jun 2026 07:32:55 -0700

On Fri, Jun 12, 2026 at 07:57:17PM +0300, Andrey Drobyshev wrote:

From: "Denis V. Lunev" <[email protected]>


Earlier commit ("ms/vhost/vsock: Refuse the connection immediately when

Please followhttps://docs.kernel.org/process/submitting-patches.html#describe-your-changeson how to refer to a commit.

guest isn't ready") added a fast-fail in vhost_transport_send_pkt().  It
rejects every host send with -EHOSTUNREACH until the destination calls
SET_RUNNING(1).  The fast-fail condition checks whether device's backends
are dropped, and if they're, the guest is considered to be not ready.

Okay, so it's not a regression, I mean without this series that patch isnot adding any regression, no?


If it's the case, I'll change the wording in the cover letter.


However, there might be other reasons for backends to be nulled.  In
particular, when QEMU is performing CPR (checkpoint-restore) migration,
device ownership is being RESET and SET again, which leads to backends
drop and reattach.  If we end up connecting during this window, an
AF_VSOCK client gets -EHOSTUNREACH, which is wrong.

Please add this change before starting to support VHOST_RESET_OWNERioctl in vhost-vsock, otherwise we are breaking the bisectability.


Add a cpr_paused flag set inside vhost_vsock_drop_backends() when the
backend was previously live, cleared by vhost_vsock_start(). When set,
vhost_transport_send_pkt() queues the skb instead of fast-failing; the
existing kick of send_pkt_work in vhost_vsock_start() drains it on
resume. A device that has never run keeps cpr_paused == false and the
boot-time fast-fail behaviour is preserved.

Pair the cpr_paused store with the backend store using an
smp_wmb()/smp_rmb() pair so a concurrent sender on a weakly-ordered
architecture never observes (NULL backend, !paused):

Signed-off-by: Denis V. Lunev <[email protected]>
---
drivers/vhost/vsock.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index e629886e5cf8..bcaba36becd7 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -61,6 +61,7 @@ struct vhost_vsock {

        u32 guest_cid;
        bool seqpacket_allow;
+       bool cpr_paused;        /* between stop and next start */
};

static u32 vhost_transport_get_local_cid(void)
@@ -311,11 +312,17 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net 
*net)
         * the mutex would be too expensive in this hot path, and we already 
have
         * all the outcomes covered: if the backend becomes NULL right after 
the check,
         * vhost_transport_do_send_pkt() will check it under the mutex anyway.
+        *
+        * Don't fast-fail if cpr_paused is set, keep queueing skbs instead.
+        * The kick in vhost_vsock_start() will drain them on resume.
         */
        if 
(unlikely(!data_race(vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])))) {
-               rcu_read_unlock();
-               kfree_skb(skb);
-               return -EHOSTUNREACH;
+               smp_rmb();      /* pairs with smp_wmb() in start/drop_backends 
*/
+               if (!READ_ONCE(vsock->cpr_paused)) {

Can we avoid this which is not really readable and maybe add a singlevariable to control the fast-fail at all?

I mean replacing both cpr_paused + backend-pointer with a single`started` flag: set it to false at open, true on start viasmp_store_release(), back to false on normal stop, and leave it trueduring CPR pause.


The reader in send_pkt can do just:

    if (!smp_load_acquire(&vsock->started))
        return -EHOSTUNREACH;

WDYT?

+                       rcu_read_unlock();
+                       kfree_skb(skb);
+                       return -EHOSTUNREACH;
+               }

That said claude here is reporting a potential issue that I think weshould consider:After VHOST_RESET_OWNER, the guest CID stays in the hash, sovhost_transport_send_pkt() can still find the vsock, skip thefast-fail (cpr_paused=true), and call vhost_vq_work_queue() whilevhost_workers_free() is freeing workers without a synchronize_rcu()— risking a use-after-free. Also, any send_pkt_work queued betweenthe last flush and worker teardown gets its VHOST_WORK_QUEUED bitstuck (the vhost task exits without draining), deadlockinghost→guest traffic after restart.

A synchronize_rcu() in vhost_workers_free() between thercu_assign_pointer(NULL) loop and the destroy loop would close theuse-after-free, and reinitializing send_pkt_work viavhost_work_init() after vhost_dev_reset_owner() returns would clearthe stuck QUEUED bit.

        }

        if (virtio_vsock_skb_reply(skb))
@@ -640,6 +647,9 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
                mutex_unlock(&vq->mutex);
        }

+       smp_wmb();      /* pairs with smp_rmb() in send_pkt */
+       WRITE_ONCE(vsock->cpr_paused, false);
+
        /* Some packets may have been queued before the device was started,
         * let's kick the send worker to send them.
         */
@@ -671,6 +681,11 @@ static void vhost_vsock_drop_backends(struct vhost_vsock 
*vsock)

        lockdep_assert_held(&vsock->dev.mutex);

+       if (vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
+               WRITE_ONCE(vsock->cpr_paused, true);
+               smp_wmb();      /* pairs with smp_rmb() in send_pkt */
+       }


Why here and not in vhost_vsock_reset_owner()?

Also having this here will set it to true also withVHOST_VSOCK_SET_RUNNING(0), is that right?


Thanks,
Stefano

+
        for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
                vq = &vsock->vqs[i];

@@ -728,6 +743,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct 
file *file)

        vsock->guest_cid = 0; /* no CID assigned yet */
        vsock->seqpacket_allow = false;
+       vsock->cpr_paused = false;

        atomic_set(&vsock->queued_replies, 0);

--
2.47.1

Re: [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause

Reply via email to