From: Nuno Das Neves <nunodasne...@linux.microsoft.com> Sent: Wednesday, June 18, 2025 2:08 PM > > On 6/11/2025 4:07 PM, Michael Kelley wrote: > > From: Nuno Das Neves <nunodasne...@linux.microsoft.com> Sent: Tuesday, June > 10, 2025 4:52 PM > >> > >> From: Stanislav Kinsburskii <skinsburs...@linux.microsoft.com> > > > > The preferred patch Subject prefix is "x86/hyperv:" > > > > Thank you for clarifying - I thought I saw some precedent for x86: hyperv: > but must have been mistaken. > > >> > >> This patch moves a part of currently internal logic into the > >> hv_map_msi_interrupt function and makes it globally available helper > >> function, which will be used to map PCI interrupts in case of root > >> partition. > > > > Avoid "this patch" in commit messages. Suggest: > > > > Create a helper function hv_map_msi_interrupt() that contains some > > logic that is currently internal to irqdomain.c. Make the helper function > > globally available so it can be used to map PCI interrupts when running > > in the root partition. > > > > Thanks, I'll rephrase. > > >> > >> Signed-off-by: Stanislav Kinsburskii <skinsburs...@linux.microsoft.com> > >> Signed-off-by: Nuno Das Neves <nunodasne...@linux.microsoft.com> > >> --- > >> arch/x86/hyperv/irqdomain.c | 47 ++++++++++++++++++++++++--------- > >> arch/x86/include/asm/mshyperv.h | 2 ++ > >> 2 files changed, 36 insertions(+), 13 deletions(-) > >> > >> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c > >> index 31f0d29cbc5e..82f3bafb93d6 100644 > >> --- a/arch/x86/hyperv/irqdomain.c > >> +++ b/arch/x86/hyperv/irqdomain.c > >> @@ -169,13 +169,40 @@ static union hv_device_id hv_build_pci_dev_id(struct > >> pci_dev *dev) > >> return dev_id; > >> } > >> > >> -static int hv_map_msi_interrupt(struct pci_dev *dev, int cpu, int vector, > >> - struct hv_interrupt_entry *entry) > >> +/** > >> + * hv_map_msi_interrupt() - "Map" the MSI IRQ in the hypervisor. > >> + * @data: Describes the IRQ > >> + * @out_entry: Hypervisor (MSI) interrupt entry (can be NULL) > >> + * > >> + * Map the IRQ in the hypervisor by issuing a MAP_DEVICE_INTERRUPT > >> hypercall. > >> + */ > >> +int hv_map_msi_interrupt(struct irq_data *data, > >> + struct hv_interrupt_entry *out_entry) > >> { > >> - union hv_device_id device_id = hv_build_pci_dev_id(dev); > >> + struct msi_desc *msidesc; > >> + struct pci_dev *dev; > >> + union hv_device_id device_id; > >> + struct hv_interrupt_entry dummy; > >> + struct irq_cfg *cfg = irqd_cfg(data); > >> + const cpumask_t *affinity; > >> + int cpu; > >> + u64 res; > >> > >> - return hv_map_interrupt(device_id, false, cpu, vector, entry); > >> + msidesc = irq_data_get_msi_desc(data); > >> + dev = msi_desc_to_pci_dev(msidesc); > >> + device_id = hv_build_pci_dev_id(dev); > >> + affinity = irq_data_get_effective_affinity_mask(data); > >> + cpu = cpumask_first_and(affinity, cpu_online_mask); > > > > Is the cpus_read_lock held at this point? I'm not sure what the > > overall calling sequence looks like. If it is not held, the CPU that > > is selected could go offline before hv_map_interrupt() is called. > > This computation of the target CPU is the same as in the code > > before this patch, but that existing code looks like it has the > > same problem. > > > > Thanks for pointing it out - It *looks* like the read lock is not held > everywhere this could be called, so it could indeed be a problem. > > I've been thinking about different ways around this but I lack the > knowledge to have an informed opinion about it: > > - We could take the cpu read lock in this function, would that work? > > - I'm not actually sure why the code is getting the first cpu off the > effective > affinity mask in the first place. It is possible to get the apic id (and > hence > the cpu) already associated with the irq, as per e.g. > x86_vector_msi_compose_msg() > Maybe we could get the cpu that way, assuming that doesn't have a similar > issue. > > - We could just let this race happen, maybe the outcome isn't too > catastrophic? > > What do you think?
I would have to study further to provide good answers to your questions as I don't have deep knowledge of this area off the top of my head. The code looked suspicious because AND'ing the affinity with the cpu_online_mask in the first place is presumably to prevent assigning the interrupt to a CPU that is offline. That's a valid intent, since such assigning would indeed be problematic. But as written the code is inherently racy unless the cpus_read_lock() is held. I'm on vacation all next week, and probably won't be able to look at this again until early July. So the best I can do for now is flag the issue. Michael > > >> + > >> + res = hv_map_interrupt(device_id, false, cpu, cfg->vector, > >> + out_entry ? out_entry : &dummy); > >> + if (!hv_result_success(res)) > >> + pr_err("%s: failed to map interrupt: %s", > >> + __func__, hv_result_to_string(res)); > > > > hv_map_interrupt() already outputs a message if the hypercall > > fails. Is another message needed here? > > > > It does print the function name, which gives additional context. > Probably it can just be removed however. > > >> + > >> + return hv_result_to_errno(res); > > > > The error handling is rather messed up. First hv_map_interrupt() > > sometimes returns a Linux errno (not negated), and sometimes a > > hypercall result. The errno is EINVAL, which has value "22", which is > > the same as hypercall result HV_STATUS_ACKNOWLEDGED. And > > the hypercall result returned from hv_map_interrupt() is just > > the result code, not the full 64-bit status, as hv_map_interrupt() > > has already done hv_result(status). Hence the "res" input arg to > > hv_result_to_errno() isn't really the correct input. For example, > > if the hypercall returns U64_MAX, that won't be caught by > > hv_result_to_errno() since the value has been truncated to > > 32 bits. Fixing all this will require some unscrambling. > > > > Good point, it's pretty messed up! I think in v2 I'll add a patch to > clean this up first. > > >> } > >> +EXPORT_SYMBOL_GPL(hv_map_msi_interrupt); > >> > >> static inline void entry_to_msi_msg(struct hv_interrupt_entry *entry, > >> struct > msi_msg *msg) > >> { > >> @@ -190,10 +217,8 @@ static void hv_irq_compose_msi_msg(struct irq_data > *data, struct msi_msg *msg) > >> { > >> struct msi_desc *msidesc; > >> struct pci_dev *dev; > >> - struct hv_interrupt_entry out_entry, *stored_entry; > >> + struct hv_interrupt_entry *stored_entry; > >> struct irq_cfg *cfg = irqd_cfg(data); > >> - const cpumask_t *affinity; > >> - int cpu; > >> u64 status; > >> > >> msidesc = irq_data_get_msi_desc(data); > >> @@ -204,9 +229,6 @@ static void hv_irq_compose_msi_msg(struct irq_data > >> *data, > struct msi_msg *msg) > >> return; > >> } > >> > >> - affinity = irq_data_get_effective_affinity_mask(data); > >> - cpu = cpumask_first_and(affinity, cpu_online_mask); > >> - > >> if (data->chip_data) { > >> /* > >> * This interrupt is already mapped. Let's unmap first. > >> @@ -235,15 +257,14 @@ static void hv_irq_compose_msi_msg(struct irq_data > *data, struct msi_msg *msg) > >> return; > >> } > >> > >> - status = hv_map_msi_interrupt(dev, cpu, cfg->vector, &out_entry); > >> + status = hv_map_msi_interrupt(data, stored_entry); > >> if (status != HV_STATUS_SUCCESS) { > > > > hv_map_msi_interrupt() returns an errno, so testing for HV_STATUS_SUCCESS > > is bogus. > > > > Thanks, noted. > > >> kfree(stored_entry); > >> return; > >> } > >> > >> - *stored_entry = out_entry; > >> data->chip_data = stored_entry; > >> - entry_to_msi_msg(&out_entry, msg); > >> + entry_to_msi_msg(data->chip_data, msg); > >> > >> return; > >> } > >> diff --git a/arch/x86/include/asm/mshyperv.h > >> b/arch/x86/include/asm/mshyperv.h > >> index 5ec92e3e2e37..843121465ddd 100644 > >> --- a/arch/x86/include/asm/mshyperv.h > >> +++ b/arch/x86/include/asm/mshyperv.h > >> @@ -261,6 +261,8 @@ static inline void hv_apic_init(void) {} > >> > >> struct irq_domain *hv_create_pci_msi_domain(void); > >> > >> +int hv_map_msi_interrupt(struct irq_data *data, > >> + struct hv_interrupt_entry *out_entry); > >> int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int > >> vector, > >> struct hv_interrupt_entry *entry); > >> int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry > >> *entry); > >> -- > >> 2.34.1