On 周四 2026-6-25 04:32, syzbot wrote:
> Hello virt maintainers/developers,
> 
> This is a 31-day syzbot report for the virt subsystem.
> All related reports/information can be found at:
> https://syzkaller.appspot.com/upstream/s/virt
> 
> During the period, 0 new issues were detected and 0 were fixed.
> In total, 5 issues are still open and 61 have already been fixed.
> There are also 2 low-priority issues.
> 
> Some of the still happening issues:
> 
> Ref Crashes Repro Title
> <1> 24      No    WARNING: refcount bug in call_timer_fn (4)
>                   https://syzkaller.appspot.com/bug?extid=07dcf509f4c013e25dc5
> <2> 3       Yes   memory leak in __vsock_create (2)
>                   https://syzkaller.appspot.com/bug?extid=1b2c9c4a0f8708082678

Hi,

This is regarding the still-open "memory leak in __vsock_create (2)"
bug (#2 in the monthly virt report, extid 1b2c9c4a0f8708082678):
  https://syzkaller.appspot.com/bug?extid=1b2c9c4a0f8708082678

I spent some time analyzing the root cause and the previous fix
attempt; below is a summary and a direction that tested out.

== Root cause ==

The leaked object is the child socket created by
virtio_transport_recv_listen() via __vsock_create() — exactly the
allocation site kmemleak points at. The reason it never gets freed is
in the accept() error path, not in the allocation itself.

When vsock_accept() dequeues a child but the listener carries an error
(listener->sk_err, e.g. set by a failed connect() issued on the socket
before listen()), it sets vconnected->rejected = true, skips
sock_graft(), drops the dequeue reference and *relies on
vsock_pending_work()* to clean the child up.

The catch: vsock_pending_work() is never scheduled on the transports
involved here. It is only ever scheduled by vmci_transport
(vmci_transport.c:1130); virtio_transport and vsock_loopback never
schedule it. So the rejected child sits with an unreleased initial
reference (the one from sk_alloc()) plus the connected-table
reference, vsock_sk_destruct() is never reached, and the cascade —
child socket, struct cred, virtio transport, SELinux blob — all leak.

The earlier commit 3a5cc90a4d17 ("vsock/virtio: remove socket from
connected/bound list on shutdown") adds an unconditional
vsock_remove_sock() in virtio_transport_recv_connected() when a
SHUTDOWN arrives, which drops the connected-table reference for a
child that later receives a SHUTDOWN; but it does not release the
sk_alloc() reference. So the leak is not really a regression
introduced there — rejected children have never been cleaned up on
transports that don't schedule pending_work. 3a5cc90a4d17 mainly
changes whether kmemleak can see the leak: on v6.6 it can (the
cascade shows up), on mainline the smaller struct sock layout leaves a
residual pointer inside the child that kmemleak counts as a reachable
reference, so mainline kmemleak stays silent even though
create/destruct accounting confirms the child never reaches
vsock_sk_destruct().

== Why the previous attempt didn't land ==

Divya's patch [1] tried to fix it by re-locking the parent listener
inside virtio_transport_recv_listen() and re-checking the shutdown
state under that lock before vsock_enqueue_accept(). That re-locks an
already-held lock — virtio_transport_recv_pkt() holds lock_sock(sk)
across the call into recv_listen() — and syzbot ci immediately flagged
"possible recursive locking" [2]. So it was backed out and the bug
stayed open.

== A direction that tests out ==

Instead of re-locking in the receive path, handle the cleanup directly
in vsock_accept(): on reject, instead of setting vconnected->rejected
and relying on pending_work, explicitly release the child's references
there:

    if (err) {
        vsock_remove_connected(vconnected);  /* connected-table ref */
        connected->sk_state = TCP_CLOSE;
        sock_put(connected);                  /* enqueue_accept ref  */
    } else {
        sock_graft(connected, newsock);
    }
    ...
    sock_put(connected);  /* the existing, common put — sk_alloc ref */

This drops exactly the three references the child holds at dequeue
time (sk_alloc + __vsock_insert_connected + vsock_enqueue_accept),
lets refcount reach zero and vsock_sk_destruct() run. The `rejected`
flag and its pending_work handling can then be removed. The receive
path is not touched, so there is no re-locking and no deadlock.

I verified this on ARM64 QEMU. On linux v6.6.y (where kmemleak can
see the leak) with the syzbot reproducer:
  - before: 6 creates / 4 destructs (2 leaked); kmemleak reports the
    cascade;
  - after:  6 creates / 6 destructs (0 leaked); kmemleak clean;
  - 50-iter normal server and 50-iter same-port-reconnect tests both
    pass 50/50 with zero leaks, no double-put warnings.
On mainline, kmemleak stays silent (see above) but create/destruct
accounting confirms the same leak before the fix; the fix is
code-identical across v6.6.y and mainline (same recv_listen/accept
paths).

I'm not subscribed to follow the list at full volume; happy to send a
formal patch (with the af_vsock.h / pending_work changes folded in)
if the direction looks right to the maintainers.

== Trigger, for completeness ==

The reproducer's atypical-but-legal sequence is what sets
listener->sk_err: a socket is connect()ed (leaving sk_err set, since
vsock_connect() only clears it at the start of a new connect) and then
turned into a listener:

    fd = socket(AF_VSOCK, SOCK_STREAM, 0);
    bind(fd, ...);
    connect(fd, &(VMADDR_CID_LOCAL, ...));   /* leaves sk_err set */
    listen(fd, 5);
    /* a peer connects to fd; the child created is later rejected */
    accept4(fd, ...);

Standard servers (listen before any connect on the same fd) don't hit
it, which is why this went ~2.5 years between the offending commit and
the syzbot report.

[1] https://lore.kernel.org/all/[email protected]/
[2] https://ci.syzbot.org/series/76f40e62-5a21-46d4-a636-10f0ec9c5040

Thanks.


> <3> 3913    Yes   INFO: rcu detected stall in do_idle
>                   https://syzkaller.appspot.com/bug?extid=385468161961cee80c31
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
> 
> To disable reminders for individual bugs, reply with the following command:
> #syz set <Ref> no-reminders
> 
> To change bug's subsystems, reply with:
> #syz set <Ref> subsystems: new-subsystem
> 
> You may send multiple commands in a single email message.


Reply via email to