On 2026-03-17 17:25:20 [+0000], Michael Kelley wrote: > From: Sebastian Andrzej Siewior <[email protected]> Sent: Thursday, March > 12, 2026 10:07 AM > > > > Let me try to address the range of questions here and in the follow-up > discussion. As background, an overview of VMBus interrupt handling is in: > > Documentation/virt/hyperv/vmbus.rst > > in the section entitled "Synthetic Interrupt Controller (synic)". The > relevant text is: > > The SINT is mapped to a single per-CPU architectural interrupt (i.e, > an 8-bit x86/x64 interrupt vector, or an arm64 PPI INTID). Because > each CPU in the guest has a synic and may receive VMBus interrupts, > they are best modeled in Linux as per-CPU interrupts. This model works > well on arm64 where a single per-CPU Linux IRQ is allocated for > VMBUS_MESSAGE_SINT. This IRQ appears in /proc/interrupts as an IRQ labelled > "Hyper-V VMbus". Since x86/x64 lacks support for per-CPU IRQs, an x86 > interrupt vector is statically allocated (HYPERVISOR_CALLBACK_VECTOR) > across all CPUs and explicitly coded to call vmbus_isr(). In this case, > there's no Linux IRQ, and the interrupts are visible in aggregate in > /proc/interrupts on the "HYP" line. > > The use of a statically allocated sysvec pre-dates my involvement in this > code starting in 2017, but I believe it was modelled after what Xen does, > and for the same reason -- to effectively create a per-CPU interrupt on > x86/x64. Acorn is also using HYPERVISOR_CALLBACK_VECTOR, but I > don't know if that is also to create a per-CPU interrupt.
If you create a vector, it becomes per-CPU. There is simply no mapping from HYPERVISOR_CALLBACK_VECTOR to request_percpu_irq(). But if we had this… … > > What clears this? This is wrongly placed. This should go to > > sysvec_hyperv_callback() instead with its matching canceling part. The > > add_interrupt_randomness() should also be there and not here. > > sysvec_hyperv_stimer0() managed to do so. > > I don't have any knowledge to bring regarding the use of > lockdep_hardirq_threaded(). It is used in IRQ core to mark the execution of an interrupt handler which becomes threaded in a forced-threaded scenario. The goal is to let lockdep know that this piece of code on !RT will be threaded on RT and therefore there is no need to report a possible locking problem that will not exist on RT. > > Different question: What guarantees that there won't be another > > interrupt before this one is done? The handshake appears to be > > deprecated. The interrupt itself returns ACKing (or not) but the actual > > handler is delayed to this thread. Depending on the userland it could > > take some time and I don't know how impatient the host is. > > In more recent versions of Hyper-V, what's deprecated is Hyper-V implicitly > and automatically doing the EOI. So in sysvec_hyperv_callback(), apic_eoi() > is usually explicitly called to ack the interrupt. > > There's no guarantee, in either the existing case or the new PREEMPT_RT > case, that another VMBus interrupt won't come in on the same CPU > before the tasklets scheduled by vmbus_message_sched() or > vmbus_chan_sched() have run. From a functional standpoint, the Linux > code and interaction with Hyper-V handles another interrupt correctly. So there is no scenario that the host will trigger interrupts because the guest is leaving the ISR without doing anything/ making progress? > From a delay standpoint, there's not a problem for the normal (i.e., not > PREEMPT_RT) case because the tasklets run as the interrupt exits -- they > don't end up in ksoftirqd. For the PREEMPT_RT case, I can see your point > about delays since the tasklets are scheduled from the new per-CPU thread. > But my understanding is that Jan's motivation for these changes is not to > achieve true RT behavior, since Hyper-V doesn't provide that anyway. > The goal is simply to make PREEMPT_RT builds functional, though Jan may > have further comments on the goal. I would be worried if the host would storming interrupts to the guest because it makes no progress. > > > + __vmbus_isr(); > > Moving on. This (trying very hard here) even schedules tasklets. Why? > > You need to disable BH before doing so. Otherwise it ends in ksoftirqd. > > You don't want that. > > Again, Jan can comment on the impact of delays due to ending up > in ksoftirqd. My point is that having this with threaded interrupt support would eliminate the usage of tasklets. > > Couldn't the whole logic be integrated into the IRQ code? Then we could > > have mask/ unmask if supported/ provided and threaded interrupts. Then > > sysvec_hyperv_reenlightenment() could use a proper threaded interrupt > > instead apic_eoi() + schedule_delayed_work(). > > As I described above, Hyper-V needs a per-CPU interrupt. It's faked up > on x86/x64 with the hardcoded HYPERVISOR_CALLBACK_VECTOR sysvec > entry, but on arm64 a normal Linux per-CPU IRQ is used. Once the execution > path gets to vmbus_isr(), the two architectures share the same code. Same > thing is done with the Hyper-V STIMER0 interrupt as a per-CPU interrupt. This one has the "random" collecting on the right spot. > If there's a better way to fake up a per-CPU interrupt on x86/x64, I'm open > to looking at it. > > As I recently discovered in discussion with Jan, standard Linux IRQ handling > will *not* thread per-CPU interrupts. So even on arm64 with a standard > Linux per-CPU IRQ is used for VMBus and STIMER0 interrupts, we can't > request threading. It would require a statement from the x86 & IRQ maintainers if it is worth on x86 to make allow pass HYPERVISOR_CALLBACK_VECTOR to request_percpu_irq() and have an IRQF_ that this one needs to be forced threaded. Otherwise we would need to remain with the workarounds. If you say that an interrupt storm can not occur, I would prefer |static DEFINE_WAIT_OVERRIDE_MAP(vmbus_map, LD_WAIT_CONFIG); |… | lock_map_acquire_try(&vmbus_map); | __vmbus_isr(); | lock_map_release(&vmbus_map); while it has mostly the same effect. Either way, that add_interrupt_randomness() should be moved to sysvec_hyperv_callback() like it has been done for sysvec_hyperv_stimer0(). It should be invoked twice now if gets there via vmbus_percpu_isr(). > I need to refresh my memory on sysvec_hyperv_reenlightenment(). If > I recall correctly, it's not a per-CPU interrupt, so it probably doesn't > need to have a hardcoded vector. Overall, the Hyper-V reenlightenment > functionality is a bit of a fossil that isn't needed on modern x86/x64 > processors that support TSC scaling. And it doesn't exist for arm64. > It might be worth seeing if it could be dropped entirely ... > > Michael Sebastian

