On Thu, 11 Dec 2025 at 11:08, Stefano Garzarella <[email protected]> wrote: > > On Thu, 11 Dec 2025 at 10:10, Stefano Garzarella <[email protected]> wrote: > > > > On Wed, Dec 10, 2025 at 04:00:19PM +0100, Melbin K Mathew wrote: > > >The virtio vsock transport currently derives its TX credit directly > > >from peer_buf_alloc, which is set from the remote endpoint's > > >SO_VM_SOCKETS_BUFFER_SIZE value. > > > > Why removing the target tree [net] from the tags? > > > > Also this is a v2, so the tags should have been [PATCH net v2], please > > check it in next versions, more info: > > > > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#subject-line > > > > > > > >On the host side this means that the amount of data we are willing to > > >queue for a connection is scaled by a guest-chosen buffer size, > > >rather than the host's own vsock configuration. A malicious guest can > > >advertise a large buffer and read slowly, causing the host to allocate > > >a correspondingly large amount of sk_buff memory. > > > > > >Introduce a small helper, virtio_transport_peer_buf_alloc(), that > > >returns min(peer_buf_alloc, buf_alloc), and use it wherever we consume > > >peer_buf_alloc: > > > > > > - virtio_transport_get_credit() > > > - virtio_transport_has_space() > > > - virtio_transport_seqpacket_enqueue() > > > > > >This ensures the effective TX window is bounded by both the peer's > > >advertised buffer and our own buf_alloc (already clamped to > > >buffer_max_size via SO_VM_SOCKETS_BUFFER_MAX_SIZE), so a remote guest > > >cannot force the host to queue more data than allowed by the host's > > >own vsock settings. > > > > > >On an unpatched Ubuntu 22.04 host (~64 GiB RAM), running a PoC with > > >32 guest vsock connections advertising 2 GiB each and reading slowly > > >drove Slab/SUnreclaim from ~0.5 GiB to ~57 GiB and the system only > > >recovered after killing the QEMU process. > > > > > >With this patch applied, rerunning the same PoC yields: > > > > > > Before: > > > MemFree: ~61.6 GiB > > > MemAvailable: ~62.3 GiB > > > Slab: ~142 MiB > > > SUnreclaim: ~117 MiB > > > > > > After 32 high-credit connections: > > > MemFree: ~61.5 GiB > > > MemAvailable: ~62.3 GiB > > > Slab: ~178 MiB > > > SUnreclaim: ~152 MiB > > > > > >i.e. only ~35 MiB increase in Slab/SUnreclaim, no host OOM, and the > > >guest remains responsive. > > > > I think we should include here a summary of what you replied to Michael > > about other transports. > > > > I can't find your reply in the archive, but I mean the reply to > > https://lore.kernel.org/netdev/[email protected]/ > > > > > > > >Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko") > > >Suggested-by: Stefano Garzarella <[email protected]> > > >Signed-off-by: Melbin K Mathew <[email protected]> > > >--- > > > net/vmw_vsock/virtio_transport_common.c | 27 ++++++++++++++++++++++--- > > > 1 file changed, 24 insertions(+), 3 deletions(-) > > > > > >diff --git a/net/vmw_vsock/virtio_transport_common.c > > >b/net/vmw_vsock/virtio_transport_common.c > > >index dcc8a1d58..02eeb96dd 100644 > > >--- a/net/vmw_vsock/virtio_transport_common.c > > >+++ b/net/vmw_vsock/virtio_transport_common.c > > >@@ -491,6 +491,25 @@ void virtio_transport_consume_skb_sent(struct sk_buff > > >*skb, bool consume) > > > } > > > EXPORT_SYMBOL_GPL(virtio_transport_consume_skb_sent); > > > > > >+/* > > >+ * Return the effective peer buffer size for TX credit computation. > > > > nit: block comment in this file doesn't leave empty line, so I'd follow > > it: > > > > @@ -491,8 +491,7 @@ void virtio_transport_consume_skb_sent(struct sk_buff > > *skb, bool consume) > > } > > EXPORT_SYMBOL_GPL(virtio_transport_consume_skb_sent); > > > > -/* > > - * Return the effective peer buffer size for TX credit computation. > > +/* Return the effective peer buffer size for TX credit computation. > > * > > * The peer advertises its receive buffer via peer_buf_alloc, but we > > * cap that to our local buf_alloc (derived from > > > > >+ * > > >+ * The peer advertises its receive buffer via peer_buf_alloc, but we > > >+ * cap that to our local buf_alloc (derived from > > >+ * SO_VM_SOCKETS_BUFFER_SIZE and already clamped to buffer_max_size) > > >+ * so that a remote endpoint cannot force us to queue more data than > > >+ * our own configuration allows. > > >+ */ > > >+static u32 virtio_transport_tx_buf_alloc(struct virtio_vsock_sock *vvs) > > >+{ > > >+ u32 peer = vvs->peer_buf_alloc; > > >+ u32 local = vvs->buf_alloc; > > >+ > > >+ if (peer > local) > > >+ return local; > > >+ return peer; > > >+} > > >+ > > > > I think here Michael was suggesting this: > > > > @@ -502,12 +502,7 @@ EXPORT_SYMBOL_GPL(virtio_transport_consume_skb_sent); > > */ > > static u32 virtio_transport_tx_buf_alloc(struct virtio_vsock_sock *vvs) > > { > > - u32 peer = vvs->peer_buf_alloc; > > - u32 local = vvs->buf_alloc; > > - > > - if (peer > local) > > - return local; > > - return peer; > > + return min(vvs->peer_buf_alloc, vvs->buf_alloc); > > } > > > > > > > u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit) > > > { > > > u32 ret; > > >@@ -499,7 +518,8 @@ u32 virtio_transport_get_credit(struct > > >virtio_vsock_sock *vvs, u32 credit) > > > return 0; > > > > > > spin_lock_bh(&vvs->tx_lock); > > >- ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt); > > >+ ret = virtio_transport_tx_buf_alloc(vvs) - > > >+ (vvs->tx_cnt - vvs->peer_fwd_cnt); > > > if (ret > credit) > > > ret = credit; > > > vvs->tx_cnt += ret; > > >@@ -831,7 +851,7 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock > > >*vsk, > > > > > > spin_lock_bh(&vvs->tx_lock); > > > > > >- if (len > vvs->peer_buf_alloc) { > > >+ if (len > virtio_transport_tx_buf_alloc(vvs)) { > > > spin_unlock_bh(&vvs->tx_lock); > > > return -EMSGSIZE; > > > } > > >@@ -882,7 +902,8 @@ static s64 virtio_transport_has_space(struct > > >vsock_sock *vsk) > > > struct virtio_vsock_sock *vvs = vsk->trans; > > > s64 bytes; > > > > > >- bytes = (s64)vvs->peer_buf_alloc - (vvs->tx_cnt - > > >vvs->peer_fwd_cnt); > > >+ bytes = (s64)virtio_transport_tx_buf_alloc(vvs) - > > >+ (vvs->tx_cnt - vvs->peer_fwd_cnt); > > > > nit: please align this: > > > > @@ -903,7 +898,7 @@ static s64 virtio_transport_has_space(struct vsock_sock > > *vsk) > > s64 bytes; > > > > bytes = (s64)virtio_transport_tx_buf_alloc(vvs) - > > - (vvs->tx_cnt - vvs->peer_fwd_cnt); > > + (vvs->tx_cnt - vvs->peer_fwd_cnt); > > if (bytes < 0) > > bytes = 0; > > > > > > Just minor things, but the patch LGTM, thanks! > > I just noticed that vsock_test are now failing because one peer (client) > try to send more than TX buffer while the RX is waiting for the whole > data. > > This should fix the test: > > From b69ca1fd3d544345b02cedfbeb362493950a87c1 Mon Sep 17 00:00:00 2001 > From: Stefano Garzarella <[email protected]> > Date: Thu, 11 Dec 2025 10:55:06 +0100 > Subject: [PATCH 1/1] vsock/test: fix seqpacket message bounds test > > From: Stefano Garzarella <[email protected]> > > The test requires the sender (client) to send all messages before waking > up the receiver (server). > Since virtio-vsock had a bug and did not respect the size of the TX > buffer, this test worked, but now that we have fixed the bug, it hangs > because the sender fills the TX buffer before waking up the receiver. > > Set the buffer size in the sender (client) as well, as we already do for > the receiver (server). > > Fixes: 5c338112e48a ("test/vsock: rework message bounds test") > Signed-off-by: Stefano Garzarella <[email protected]> > --- > tools/testing/vsock/vsock_test.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/tools/testing/vsock/vsock_test.c > b/tools/testing/vsock/vsock_test.c > index 9e1250790f33..af6665ed19d5 100644 > --- a/tools/testing/vsock/vsock_test.c > +++ b/tools/testing/vsock/vsock_test.c > @@ -351,6 +351,7 @@ static void test_stream_msg_peek_server(const struct > test_opts *opts) > > static void test_seqpacket_msg_bounds_client(const struct test_opts *opts) > { > + unsigned long long sock_buf_size; > unsigned long curr_hash; > size_t max_msg_size; > int page_size; > @@ -363,6 +364,16 @@ static void test_seqpacket_msg_bounds_client(const > struct test_opts *opts) > exit(EXIT_FAILURE); > } > > + sock_buf_size = SOCK_BUF_SIZE; > + > + setsockopt_ull_check(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_MAX_SIZE, > + sock_buf_size, > + "setsockopt(SO_VM_SOCKETS_BUFFER_MAX_SIZE)"); > + > + setsockopt_ull_check(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_SIZE, > + sock_buf_size, > + "setsockopt(SO_VM_SOCKETS_BUFFER_SIZE)"); > + > /* Wait, until receiver sets buffer size. */ > control_expectln("SRVREADY"); > > -- > 2.52.0 > > Please add that patch to a series (e.g. v3) which includes your patch, > and that fix for the test.
I saw you sent v3 without this, never mind, I'll post it directly. Stefano > > Maybe we can also add a new test to check exactly the problem you're > fixing, to avoid regressions. > > Thanks, > Stefano

