On 周四 2026-6-25 04:32, syzbot wrote:
> Hello virt maintainers/developers,
>
> This is a 31-day syzbot report for the virt subsystem.
> All related reports/information can be found at:
> https://syzkaller.appspot.com/upstream/s/virt
>
> During the period, 0 new issues were detected and 0 were fixed.
> In total, 5 issues are still open and 61 have already been fixed.
> There are also 2 low-priority issues.
>
> Some of the still happening issues:
>
> Ref Crashes Repro Title
> <1> 24 No WARNING: refcount bug in call_timer_fn (4)
> https://syzkaller.appspot.com/bug?extid=07dcf509f4c013e25dc5
> <2> 3 Yes memory leak in __vsock_create (2)
> https://syzkaller.appspot.com/bug?extid=1b2c9c4a0f8708082678
Hi,
This is regarding the still-open "memory leak in __vsock_create (2)"
bug (#2 in the monthly virt report, extid 1b2c9c4a0f8708082678):
https://syzkaller.appspot.com/bug?extid=1b2c9c4a0f8708082678
I spent some time analyzing the root cause and the previous fix
attempt; below is a summary and a direction that tested out.
== Root cause ==
The leaked object is the child socket created by
virtio_transport_recv_listen() via __vsock_create() — exactly the
allocation site kmemleak points at. The reason it never gets freed is
in the accept() error path, not in the allocation itself.
When vsock_accept() dequeues a child but the listener carries an error
(listener->sk_err, e.g. set by a failed connect() issued on the socket
before listen()), it sets vconnected->rejected = true, skips
sock_graft(), drops the dequeue reference and *relies on
vsock_pending_work()* to clean the child up.
The catch: vsock_pending_work() is never scheduled on the transports
involved here. It is only ever scheduled by vmci_transport
(vmci_transport.c:1130); virtio_transport and vsock_loopback never
schedule it. So the rejected child sits with an unreleased initial
reference (the one from sk_alloc()) plus the connected-table
reference, vsock_sk_destruct() is never reached, and the cascade —
child socket, struct cred, virtio transport, SELinux blob — all leak.
The earlier commit 3a5cc90a4d17 ("vsock/virtio: remove socket from
connected/bound list on shutdown") adds an unconditional
vsock_remove_sock() in virtio_transport_recv_connected() when a
SHUTDOWN arrives, which drops the connected-table reference for a
child that later receives a SHUTDOWN; but it does not release the
sk_alloc() reference. So the leak is not really a regression
introduced there — rejected children have never been cleaned up on
transports that don't schedule pending_work. 3a5cc90a4d17 mainly
changes whether kmemleak can see the leak: on v6.6 it can (the
cascade shows up), on mainline the smaller struct sock layout leaves a
residual pointer inside the child that kmemleak counts as a reachable
reference, so mainline kmemleak stays silent even though
create/destruct accounting confirms the child never reaches
vsock_sk_destruct().
== Why the previous attempt didn't land ==
Divya's patch [1] tried to fix it by re-locking the parent listener
inside virtio_transport_recv_listen() and re-checking the shutdown
state under that lock before vsock_enqueue_accept(). That re-locks an
already-held lock — virtio_transport_recv_pkt() holds lock_sock(sk)
across the call into recv_listen() — and syzbot ci immediately flagged
"possible recursive locking" [2]. So it was backed out and the bug
stayed open.
== A direction that tests out ==
Instead of re-locking in the receive path, handle the cleanup directly
in vsock_accept(): on reject, instead of setting vconnected->rejected
and relying on pending_work, explicitly release the child's references
there:
if (err) {
vsock_remove_connected(vconnected); /* connected-table ref */
connected->sk_state = TCP_CLOSE;
sock_put(connected); /* enqueue_accept ref */
} else {
sock_graft(connected, newsock);
}
...
sock_put(connected); /* the existing, common put — sk_alloc ref */
This drops exactly the three references the child holds at dequeue
time (sk_alloc + __vsock_insert_connected + vsock_enqueue_accept),
lets refcount reach zero and vsock_sk_destruct() run. The `rejected`
flag and its pending_work handling can then be removed. The receive
path is not touched, so there is no re-locking and no deadlock.
I verified this on ARM64 QEMU. On linux v6.6.y (where kmemleak can
see the leak) with the syzbot reproducer:
- before: 6 creates / 4 destructs (2 leaked); kmemleak reports the
cascade;
- after: 6 creates / 6 destructs (0 leaked); kmemleak clean;
- 50-iter normal server and 50-iter same-port-reconnect tests both
pass 50/50 with zero leaks, no double-put warnings.
On mainline, kmemleak stays silent (see above) but create/destruct
accounting confirms the same leak before the fix; the fix is
code-identical across v6.6.y and mainline (same recv_listen/accept
paths).
I'm not subscribed to follow the list at full volume; happy to send a
formal patch (with the af_vsock.h / pending_work changes folded in)
if the direction looks right to the maintainers.
== Trigger, for completeness ==
The reproducer's atypical-but-legal sequence is what sets
listener->sk_err: a socket is connect()ed (leaving sk_err set, since
vsock_connect() only clears it at the start of a new connect) and then
turned into a listener:
fd = socket(AF_VSOCK, SOCK_STREAM, 0);
bind(fd, ...);
connect(fd, &(VMADDR_CID_LOCAL, ...)); /* leaves sk_err set */
listen(fd, 5);
/* a peer connects to fd; the child created is later rejected */
accept4(fd, ...);
Standard servers (listen before any connect on the same fd) don't hit
it, which is why this went ~2.5 years between the offending commit and
the syzbot report.
[1] https://lore.kernel.org/all/[email protected]/
[2] https://ci.syzbot.org/series/76f40e62-5a21-46d4-a636-10f0ec9c5040
Thanks.
> <3> 3913 Yes INFO: rcu detected stall in do_idle
> https://syzkaller.appspot.com/bug?extid=385468161961cee80c31
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> To disable reminders for individual bugs, reply with the following command:
> #syz set <Ref> no-reminders
>
> To change bug's subsystems, reply with:
> #syz set <Ref> subsystems: new-subsystem
>
> You may send multiple commands in a single email message.