> On May 5, 2025, at 9:40 PM, patchwork-bot+netdev...@kernel.org wrote: > > Hello: > > This patch was applied to netdev/net-next.git (main) > by Jakub Kicinski <k...@kernel.org>:
Hey all, Writing to fire up a flare and point out a problem that we’re seeing with this patch internally, specifically when we enable iommu on the virtio-net device. With this patch applied on 6.12.y-based bare metal instance and then starting a 6.12.y based guest with iommu enabled, we see lockups within the guest in short order, as well as vmm (qemu) stuck in a tight loop responding to iommu misses from vhost net loop. We've bisected this in our internal tree, and for sure it is this patch that is alledgedly causing the problem, so I wanted to point out there is some sort of issue here. Working on trying to figure this out, but if jumps off the page to anyone, happy to take advice! Flamegraph: https://gist.github.com/JonKohler/0e83c014230ab59ddc950f10441335f1#file-iotlb-lockup-svg Guest dmesg errors like so: [ 66.081694] virtio_net virtio0 eth0: NETDEV WATCHDOG: CPU: 1: transmit queue 0 timed out 5500 ms [ 68.145155] virtio_net virtio0 eth0: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 7560000 usecs ago [ 112.907012] virtio_net virtio0 eth0: NETDEV WATCHDOG: CPU: 1: transmit queue 0 timed out 5568 ms [ 124.117540] virtio_net virtio0 eth0: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 16776000 usecs ago [ 124.118050] virtio_net virtio0 eth0: NETDEV WATCHDOG: CPU: 1: transmit queue 0 timed out 16776 ms [ 124.118447] virtio_net virtio0 eth0: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 16776000 usecs ago Host level top output 3992758 qemu 20 0 16.6g 52168 26704 R 99.9 0.0 21:23.72 qemu-kvm <<< this is the qemu main thread 3992769 qemu 20 0 16.6g 52168 26704 R 58.8 0.0 13:33.44 vhost-3992758 <<< this is the vhost-net kthread For qemu-kvm main thread: Samples: 13K of event 'cycles:P', 4000 Hz, Event count (approx.): 5131922583 lost: 0/0 drop: 0/0 Children Self Shared Object Symbol - 87.41% 0.30% [kernel] [k] entry_SYSCALL_64_after_hwframe - 87.11% entry_SYSCALL_64_after_hwframe - do_syscall_64 - 44.79% ksys_write - 43.74% vfs_write - 40.96% vhost_chr_write_iter - 38.22% vhost_process_iotlb_msg - 13.72% vhost_iotlb_add_range_ctx - 7.43% vhost_iotlb_map_free - 4.37% vhost_iotlb_itree_remove rb_next 1.78% __rb_erase_color 0.73% kfree 1.15% __rb_insert_augmented 0.68% __kmalloc_cache_noprof - 10.73% vhost_vq_work_queue - 7.65% try_to_wake_up - 2.55% ttwu_queue_wakelist - 1.72% __smp_call_single_queue 1.36% call_function_single_prep_ipi - 1.32% __task_rq_lock - _raw_spin_lock native_queued_spin_lock_slowpath - 1.30% select_task_rq - select_task_rq_fair - 0.88% wake_affine available_idle_cpu 2.06% llist_add_batch - 4.05% __mutex_lock.constprop.0 2.14% mutex_spin_on_owner 0.72% osq_lock 3.00% mutex_lock - 1.72% kfree - 1.16% __slab_free slab_update_freelist.constprop.0.isra.0 1.37% _raw_spin_lock 1.08% mutex_unlock 1.98% _copy_from_iter - 1.86% rw_verify_area - security_file_permission - 1.13% file_has_perm 0.69% avc_has_perm 0.63% fdget_pos - 27.86% syscall_exit_to_user_mode - syscall_exit_to_user_mode_prepare - 25.96% __audit_syscall_exit - 25.03% __audit_filter_op 6.66% audit_filter_rules.constprop.0 1.27% audit_reset_context.part.0.constprop.0 - 10.86% ksys_read - 9.37% vfs_read - 6.67% vhost_chr_read_iter 1.48% _copy_to_iter 1.36% _raw_spin_lock - 1.30% __wake_up 0.81% _raw_spin_lock_irqsave - 1.25% vhost_enqueue_msg _raw_spin_lock - 1.83% rw_verify_area - security_file_permission - 1.03% file_has_perm 0.64% avc_has_perm 0.65% fdget_pos 0.57% fput - 2.56% syscall_trace_enter - 1.25% __seccomp_filter seccomp_run_filters 0.54% __audit_syscall_entry vhost-net thread Samples: 20K of event 'cycles:P', 4000 Hz, Event count (approx.): 7796456297 lost: 0/0 drop: 0/0 Children Self Shared Object Symbol - 100.00% 3.38% [kernel] [k] vhost_task_fn 38.26% 0xffffffff930bb8c0 - 3.36% 0 ret_from_fork_asm ret_from_fork - 1.16% vhost_task_fn - 2.35% vhost_run_work_list - 1.67% handle_tx - 7.09% __mutex_lock.constprop.0 6.64% mutex_spin_on_owner - 0.84% vq_meta_prefetch - 3.22% iotlb_access_ok 2.50% vhost_iotlb_itree_first 0.80% mutex_lock - 0.75% handle_tx_copy 0.86% llist_reverse_order > > On Wed, 30 Apr 2025 19:04:28 -0700 you wrote: >> In handle_tx_copy, TX batching processes packets below ~PAGE_SIZE and >> batches up to 64 messages before calling sock->sendmsg. >> >> Currently, when there are no more messages on the ring to dequeue, >> handle_tx_copy re-enables kicks on the ring *before* firing off the >> batch sendmsg. However, sock->sendmsg incurs a non-zero delay, >> especially if it needs to wake up a thread (e.g., another vhost worker). >> >> [...] > > Here is the summary with links: > - [net-next,v3] vhost/net: Defer TX queue re-enable until after sendmsg > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_netdev_net-2Dnext_c_8c2e6b26ffe2&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=0XoR6N9VbkaJ_wBENy8Z28uDdqjCe4HRNCyV-8o4etqXeEJOqoFFGjeGGP5sQcmt&s=-X8si_rU8pXKNyWNNzBqx5Fmv-ut9w2gS5E6coMDApM&e= > > > You are awesome, thank you! > -- > Deet-doot-dot, I am a bot. > https://urldefense.proofpoint.com/v2/url?u=https-3A__korg.docs.kernel.org_patchwork_pwbot.html&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=0XoR6N9VbkaJ_wBENy8Z28uDdqjCe4HRNCyV-8o4etqXeEJOqoFFGjeGGP5sQcmt&s=sydedZsBCMSJM9_Ldw6Al-BplvM7FokLwV_80bJpGnM&e= > > >