On 18.03.26 11:01, Sebastian Andrzej Siewior wrote: > On 2026-03-17 17:25:20 [+0000], Michael Kelley wrote: >> From: Sebastian Andrzej Siewior <[email protected]> Sent: Thursday, >> March 12, 2026 10:07 AM >>> >> >> Let me try to address the range of questions here and in the follow-up >> discussion. As background, an overview of VMBus interrupt handling is in: >> >> Documentation/virt/hyperv/vmbus.rst >> >> in the section entitled "Synthetic Interrupt Controller (synic)". The >> relevant text is: >> >> The SINT is mapped to a single per-CPU architectural interrupt (i.e, >> an 8-bit x86/x64 interrupt vector, or an arm64 PPI INTID). Because >> each CPU in the guest has a synic and may receive VMBus interrupts, >> they are best modeled in Linux as per-CPU interrupts. This model works >> well on arm64 where a single per-CPU Linux IRQ is allocated for >> VMBUS_MESSAGE_SINT. This IRQ appears in /proc/interrupts as an IRQ >> labelled >> "Hyper-V VMbus". Since x86/x64 lacks support for per-CPU IRQs, an x86 >> interrupt vector is statically allocated (HYPERVISOR_CALLBACK_VECTOR) >> across all CPUs and explicitly coded to call vmbus_isr(). In this case, >> there's no Linux IRQ, and the interrupts are visible in aggregate in >> /proc/interrupts on the "HYP" line. >> >> The use of a statically allocated sysvec pre-dates my involvement in this >> code starting in 2017, but I believe it was modelled after what Xen does, >> and for the same reason -- to effectively create a per-CPU interrupt on >> x86/x64. Acorn is also using HYPERVISOR_CALLBACK_VECTOR, but I >> don't know if that is also to create a per-CPU interrupt. > > If you create a vector, it becomes per-CPU. There is simply no mapping > from HYPERVISOR_CALLBACK_VECTOR to request_percpu_irq(). But if we had > this… > > … >>> What clears this? This is wrongly placed. This should go to >>> sysvec_hyperv_callback() instead with its matching canceling part. The >>> add_interrupt_randomness() should also be there and not here. >>> sysvec_hyperv_stimer0() managed to do so. >> >> I don't have any knowledge to bring regarding the use of >> lockdep_hardirq_threaded(). > > It is used in IRQ core to mark the execution of an interrupt handler > which becomes threaded in a forced-threaded scenario. The goal is to let > lockdep know that this piece of code on !RT will be threaded on RT and > therefore there is no need to report a possible locking problem that > will not exist on RT. > >>> Different question: What guarantees that there won't be another >>> interrupt before this one is done? The handshake appears to be >>> deprecated. The interrupt itself returns ACKing (or not) but the actual >>> handler is delayed to this thread. Depending on the userland it could >>> take some time and I don't know how impatient the host is. >> >> In more recent versions of Hyper-V, what's deprecated is Hyper-V implicitly >> and automatically doing the EOI. So in sysvec_hyperv_callback(), apic_eoi() >> is usually explicitly called to ack the interrupt. >> >> There's no guarantee, in either the existing case or the new PREEMPT_RT >> case, that another VMBus interrupt won't come in on the same CPU >> before the tasklets scheduled by vmbus_message_sched() or >> vmbus_chan_sched() have run. From a functional standpoint, the Linux >> code and interaction with Hyper-V handles another interrupt correctly. > > So there is no scenario that the host will trigger interrupts because > the guest is leaving the ISR without doing anything/ making progress? > >> From a delay standpoint, there's not a problem for the normal (i.e., not >> PREEMPT_RT) case because the tasklets run as the interrupt exits -- they >> don't end up in ksoftirqd. For the PREEMPT_RT case, I can see your point >> about delays since the tasklets are scheduled from the new per-CPU thread. >> But my understanding is that Jan's motivation for these changes is not to >> achieve true RT behavior, since Hyper-V doesn't provide that anyway. >> The goal is simply to make PREEMPT_RT builds functional, though Jan may >> have further comments on the goal. > > I would be worried if the host would storming interrupts to the guest > because it makes no progress. > >>>> + __vmbus_isr(); >>> Moving on. This (trying very hard here) even schedules tasklets. Why? >>> You need to disable BH before doing so. Otherwise it ends in ksoftirqd. >>> You don't want that. >> >> Again, Jan can comment on the impact of delays due to ending up >> in ksoftirqd. > > My point is that having this with threaded interrupt support would > eliminate the usage of tasklets. > >>> Couldn't the whole logic be integrated into the IRQ code? Then we could >>> have mask/ unmask if supported/ provided and threaded interrupts. Then >>> sysvec_hyperv_reenlightenment() could use a proper threaded interrupt >>> instead apic_eoi() + schedule_delayed_work(). >> >> As I described above, Hyper-V needs a per-CPU interrupt. It's faked up >> on x86/x64 with the hardcoded HYPERVISOR_CALLBACK_VECTOR sysvec >> entry, but on arm64 a normal Linux per-CPU IRQ is used. Once the execution >> path gets to vmbus_isr(), the two architectures share the same code. Same >> thing is done with the Hyper-V STIMER0 interrupt as a per-CPU interrupt. > > This one has the "random" collecting on the right spot. > >> If there's a better way to fake up a per-CPU interrupt on x86/x64, I'm open >> to looking at it. >> >> As I recently discovered in discussion with Jan, standard Linux IRQ handling >> will *not* thread per-CPU interrupts. So even on arm64 with a standard >> Linux per-CPU IRQ is used for VMBus and STIMER0 interrupts, we can't >> request threading. > > It would require a statement from the x86 & IRQ maintainers if it is > worth on x86 to make allow pass HYPERVISOR_CALLBACK_VECTOR to > request_percpu_irq() and have an IRQF_ that this one needs to be forced > threaded. Otherwise we would need to remain with the workarounds. > > If you say that an interrupt storm can not occur, I would prefer > |static DEFINE_WAIT_OVERRIDE_MAP(vmbus_map, LD_WAIT_CONFIG); > |… > | lock_map_acquire_try(&vmbus_map); > | __vmbus_isr(); > | lock_map_release(&vmbus_map); > > while it has mostly the same effect. > > Either way, that add_interrupt_randomness() should be moved to > sysvec_hyperv_callback() like it has been done for > sysvec_hyperv_stimer0(). It should be invoked twice now if gets there > via vmbus_percpu_isr().
No, this would degrade arm64. Jan -- Siemens AG, Foundational Technologies Linux Expert Center

