On Tue, Jun 09, 2026 at 12:48:05AM +0000, Octavian Purdila wrote:
When transmission fails in virtio_transport_send_pkt_info, the msg_iter
might have been partially advanced. If we don't restore it, the next
attempt to send data will use an incorrect iterator state, leading to
desync and warnings like "send_pkt() returns 0, but X expected".
Thanks for the fix! I have some comments.
Specifically, this can happen in the following scenario, triggered by
the syzkaller repro:
1. A write-only VMA (PROT_WRITE only) is partially populated by a
prior TUN write that failed with -EIO but still faulted in some
pages).
2. A vsock sendmmsg call with MSG_ZEROCOPY requests transmission of a
buffer from this VMA.
3. The first packet (64KB) is sent successfully because the pages are
populated.
4. The second packet allocation fails because GUP fast pins the first page
but GUP slow fails on the next unpopulated page due to PROT_WRITE-only
permissions.
5. The iterator is advanced by the partially successful GUP (68KB total
advanced: 64KB from first packet + 4KB from second), but the send loop
breaks and only reports 64KB sent. This creates a 4KB desync.
6. The next retry starts with a non-zero iov_offset, disabling zerocopy
and falling back to copy mode.
7. In copy mode, the transmission succeeds for the next packets but
exhausts the iterator early because of the desync.
8. The final retry sees an empty iterator but zerocopy is re-enabled
(offset resets). It attempts to send the remaining bytes with zerocopy
but pins 0 pages, creating an empty packet.
9. The transport sends the empty packet, triggering the warning because
the returned bytes (header only) do not match the expected payload size.
10. The loop continues to spin, allocating ubuf_info each time, eventually
exhausting sysctl_optmem_max and returning -ENOMEM to userspace.
Restore msg_iter to its original state before the packet allocation
and transmission attempt if they fail.
Fixes: e0718bd82e27 ("vsock: enable setting SO_ZEROCOPY")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=28e5f3d207b14bae122a
Assisted-by: gemini:gemini-3.1-pro
Signed-off-by: Octavian Purdila <[email protected]>
---
net/vmw_vsock/virtio_transport_common.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/net/vmw_vsock/virtio_transport_common.c
b/net/vmw_vsock/virtio_transport_common.c
index b10666937c490..588623a3e2bbc 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -367,6 +367,10 @@ static int virtio_transport_send_pkt_info(struct
vsock_sock *vsk,
do {
struct sk_buff *skb;
size_t skb_len;
+ struct iov_iter saved_iter;
trivial: reverse xmas tree:
https://docs.kernel.org/process/maintainer-netdev.html#local-variable-ordering-reverse-xmas-tree-rcs
+
+ if (info->msg)
+ saved_iter = info->msg->msg_iter;
What about using iov_iter_save_state()/iov_iter_restore() ?
IIUC we may need to export iov_iter_restore(), so not a strong opinion,
but it looks better to use those API IMHO.
skb_len = min(max_skb_len, rest_len);
@@ -375,6 +379,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock
*vsk,
src_cid, src_port,
dst_cid, dst_port);
What about adding a comment on top of virtio_transport_alloc_skb() call
(or when we save the state) to explain that in specific cases it can
advance the msg_iter ?
if (!skb) {
+ if (info->msg)
+ info->msg->msg_iter = saved_iter;
ret = -ENOMEM;
break;
}
@@ -382,8 +388,11 @@ static int virtio_transport_send_pkt_info(struct
vsock_sock *vsk,
virtio_transport_inc_tx_pkt(vvs, skb);
ret = t_ops->send_pkt(skb, info->net);
- if (ret < 0)
+ if (ret < 0) {
+ if (info->msg)
+ info->msg->msg_iter = saved_iter;
Also, what about having a single restore point after the loop?
I mean something like this (untested):
diff --git a/net/vmw_vsock/virtio_transport_common.c
b/net/vmw_vsock/virtio_transport_common.c
index b10666937c49..2f3c6c82c155 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -295,6 +295,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock
*vsk,
u32 max_skb_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
u32 src_cid, src_port, dst_cid, dst_port;
const struct virtio_transport *t_ops;
+ struct iov_iter_state msg_iter_state;
struct virtio_vsock_sock *vvs;
struct ubuf_info *uarg = NULL;
u32 pkt_len = info->pkt_len;
@@ -368,6 +369,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock
*vsk,
struct sk_buff *skb;
size_t skb_len;
+ if (info->msg)
+ iov_iter_save_state(&info->msg->msg_iter,
&msg_iter_state);
+
skb_len = min(max_skb_len, rest_len);
skb = virtio_transport_alloc_skb(info, skb_len, can_zcopy,
@@ -399,6 +403,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock
*vsk,
break;
} while (rest_len);
+ if (info->msg && ret < 0)
+ iov_iter_restore(&info->msg->msg_iter, &msg_iter_state);
+
virtio_transport_put_credit(vvs, rest_len);
/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
Thanks,
Stefano
break;
+ }
/* Both virtio and vhost 'send_pkt()' returns 'skb_len',
* but for reliability use 'ret' instead of 'skb_len'.
--
2.54.0.1064.gd145956f57-goog