Hi, I'm experiencing a bug where SSH sessions over vsock take 2-20+ seconds to establish due to poll() not signaling POLLIN when data is available. The bug does NOT occur on the first connection after VM boot, but affects all subsequent connections.
* Summary - vsock poll() fails to return POLLIN when data is in the receive buffer - sshd-session's ppoll() times out every ~20ms instead of waking on data - First SSH connection after guest boot works instantly - All subsequent connections experience 2-20+ second delays - Non-PTY commands (ssh -T ... 'echo test') work instantly - TCP connections to the same VM work instantly * Environment Host: - OS: Arch Linux - Kernel: 6.18.2-arch2-1 - QEMU: system package (latest) Guest: - OS: Debian trixie - Kernel: 6.17.13+deb13-amd64 (also tested on 6.12.57, same issue) - OpenSSH: 10.0p2 QEMU command (relevant parts): qemu-system-x86_64 -enable-kvm -smp 8 \ -object memory-backend-memfd,id=mem,size=20G,share=on \ -machine memory-backend=mem \ -device vhost-vsock-pci,guest-cid=5 \ ... Connection method: ssh user@vsock/5 (via systemd-ssh-proxy) * Symptoms Interactive SSH (PTY) - SLOW: $ time ssh user@vsock/5 # Takes 2-20+ seconds before shell prompt appears Non-interactive SSH - FAST: $ time ssh user@vsock/5 'echo test' test real 0m0.156s TCP to same VM - FAST: $ time ssh -p 33594 [email protected] # Instant * Key observation: First connection after boot is fast After guest reboot: $ ssh user@vsock/5 # INSTANT (< 1 second) $ exit $ ssh user@vsock/5 # SLOW (2-20 seconds) $ ssh user@vsock/5 # SLOW ... This suggests the bug involves state that accumulates or isn't properly cleaned up between connections. ** bpftrace evidence Using syscall tracepoints on guest during slow connection: === MINIMAL VSOCK DIAGNOSTIC === [ 29 ms] sshd-session: ppoll() duration=19 ms ret=1 ^^^ 20ms TIMEOUT pattern detected! [ 50 ms] sshd-session: ppoll() duration=20 ms ret=1 ^^^ 20ms TIMEOUT pattern detected! [ 70 ms] sshd-session: ppoll() duration=18 ms ret=1 ^^^ 20ms TIMEOUT pattern detected! ... (continues for ~2 seconds) ... [ 5000 ms] --- 5s stats: ppoll=455, timeouts=103, recv=0 (0 bytes) --- [19432 ms] sshd: recvmsg() = 308 bytes [4 µs] [19442 ms] sshd-session: recvmsg() = 308 bytes [4 µs] Pattern analysis: - ppoll() returns ret=1 (1 fd ready) but takes exactly ~20ms (timeout) - The ready fd is the PTY, NOT the vsock socket - recv=0 during the timeout phase: vsock data not being read - recvmsg() finally succeeds after ~19 seconds - When recvmsg() runs, it completes in 4 microseconds (data WAS there) This proves: data is sitting in the vsock receive buffer, but poll() is not returning POLLIN, so sshd doesn't know to read it. * 30-second summary from bpftrace Total ppoll calls: 488 Timeouts (20ms pattern): 103 Successful recvmsg: 6 (984 bytes) Timeout rate: 21% * Why PTY-specific? PTY sessions require bidirectional traffic: 1. Server sends shell prompt → client must receive it 2. Client sends keypress → server must receive it 3. Server sends echo → client must receive it Each exchange relies on poll() waking on POLLIN. The bug causes poll() to miss the wakeup, forcing sshd to wait for its 20ms timeout fallback. Non-PTY commands do request-response-exit quickly before the bug manifests significantly. ## Additional context I previously encountered the identical issue on WSL2's Hyper-V vsock implementation, suggesting this may be a fundamental issue with how vsock transports handle poll/wakeup semantics, not specific to virtio. ## Hypothesis Based on the evidence, this appears to be a lost wakeup race condition: 1. Host sends packet to guest 2. Packet is enqueued to socket's rx_queue 3. sk_data_ready() is called but poll waiters aren't properly woken 4. vsock_poll() returns 0 (no POLLIN) despite data being available 5. ppoll() times out after 20ms, sshd retries 6. Eventually succeeds through timeout-based retry The "first connection works" pattern suggests the race involves existing state from previous connections - possibly worker threads, interrupt handlers, or virtqueue state that isn't properly reset. ## Reproducer 1. Start QEMU VM with vhost-vsock-pci device 2. Boot guest, ensure sshd is running 3. From host: ssh user@vsock/<CID> # First connection is fast 4. Exit and reconnect: ssh user@vsock/<CID> # Now slow ## Request Could someone familiar with the vsock/virtio poll implementation review the wakeup path? Specifically: - virtio_transport_recv_pkt() -> sk_data_ready() path - vsock_poll() -> poll_wait() registration timing - Any state that persists between connections Happy to provide additional traces or test patches. Thanks, [Your Name] --- bpftrace script used (runs on guest): #!/usr/bin/env bpftrace BEGIN { @start = nsecs; printf("=== MINIMAL VSOCK DIAGNOSTIC ===\n"); } tracepoint:syscalls:sys_enter_ppoll { if (comm == "sshd-session" || comm == "sshd") { @ppoll_enter[tid] = nsecs; @ppoll_count++; } } tracepoint:syscalls:sys_exit_ppoll { if (@ppoll_enter[tid]) { $ms = (nsecs - @start) / 1000000; $dur = (nsecs - @ppoll_enter[tid]) / 1000000; if ($dur > 10) { printf("[%5lld ms] %s: ppoll() duration=%lld ms ret=%d\n", $ms, comm, $dur, args->ret); if ($dur >= 18 && $dur <= 25) { printf(" ^^^ 20ms TIMEOUT pattern detected!\n"); @timeout_count++; } } delete(@ppoll_enter[tid]); } } tracepoint:syscalls:sys_exit_recvmsg { if (comm == "sshd-session" || comm == "sshd") { if (args->ret > 0) { $ms = (nsecs - @start) / 1000000; printf("[%5lld ms] %s: recvmsg() = %lld bytes\n", $ms, comm, args->ret); @recv_count++; @recv_bytes += args->ret; } } } interval:s:5 { printf("\n[%5lld ms] --- 5s stats: ppoll=%d, timeouts=%d, recv=%d (%d bytes) ---\n\n", (nsecs - @start) / 1000000, @ppoll_count, @timeout_count, @recv_count, @recv_bytes); }
