> -----Original Message-----
> From: Michael Kelley <[email protected]>
> Sent: Monday, March 16, 2026 12:16 PM
> To: Long Li <[email protected]>; Michael Kelley <[email protected]>;
> KY Srinivasan <[email protected]>; Haiyang Zhang
> <[email protected]>; Wei Liu <[email protected]>; Dexuan Cui
> <[email protected]>; Lorenzo Pieralisi <[email protected]>; Krzysztof
> WilczyĆski <[email protected]>; Manivannan Sadhasivam
> <[email protected]>; Bjorn Helgaas <[email protected]>
> Cc: Rob Herring <[email protected]>; Michael Kelley <[email protected]>;
> [email protected]; [email protected]; linux-
> [email protected]
> Subject: [EXTERNAL] RE: [PATCH] PCI: hv: Set default NUMA node to 0 for
> devices without affinity info
>
> From: Long Li <[email protected]> Sent: Monday, March 16, 2026 10:38
> AM
> >
> > > Subject: [EXTERNAL] RE: [PATCH] PCI: hv: Set default NUMA node to 0
> > > for devices without affinity info
> > >
> > > From: Long Li <[email protected]> Sent: Thursday, March 12, 2026
> > > 3:33 PM
> > > >
> > > > When a Hyper-V PCI device does not have
> > > > HV_PCI_DEVICE_FLAG_NUMA_AFFINITY set or has an out-of-range
> > > > virtual_numa_node, hv_pci_assign_numa_node() leaves the device
> > > > NUMA node unset. On x86_64, the default NUMA node happens to be 0,
> > > > but on
> > > > ARM64 it is NUMA_NO_NODE (-1), leading to inconsistent behavior
> > > > across architectures.
> > > >
> > > > In Azure, when no NUMA information is available from the host,
> > > > devices perform best when assigned to node 0. Set the device NUMA
> > > > node to 0 unconditionally before the conditional NUMA affinity
> > > > check, so that devices always get a valid default and behavior is
> > > > consistent on both
> > > > x86_64 and ARM64.
> > >
> > > I'm wondering if this is the right overall approach to the inconsistency.
> > > Arguably, the arm64 value of NUMA_NO_NODE is more correct when the
> > > Hyper- V host has not provided any NUMA information to the guest.
> > > Maybe the x86/x64 side should be changed to default to NUMA_NO_NODE
> > > when there's no NUMA information provided.
> >
> > Tests have shown when Azure doesn't provide NUMA information for a PCI
> > device, workloads runs best when the node defaults to 0. NUMA_NO_NODE
> > results in performance degradation on ARM64. This affects most
> > high-performance devices like MANA when tested to line limit.
> >
> > >
> > > The observed x86/x64 default of NUMA node 0 does not come from
> > > x86/x64 architecture specific PCI code. It's a Hyper-V specific
> > > behavior due to how
> > > hv_pci_probe() allocates the struct hv_pcibus_device, with its
> > > embedded struct pci_sysdata. That struct pci_sysdata has a "node"
> > > field that the x86/x64
> > > __pcibus_to_node() function accesses when called from pci_device_add().
> > > If hv_pci_probe() were to initialize that "node" field to
> > > NUMA_NO_NODE at the same time that it sets the "domain" field,
> > > x86/x64 guests on Hyper-V would see the PCI device NUMA node default
> > > to NUMA_NO_NODE like on arm64. The current behavior of letting the
> > > sysdata "node" field stay zero as allocated might just be an historical
> oversight that no one noticed.
> >
> > I agree this was an oversight in the original X64 code, in that it
> > sets to numa node 0 by chance. But it turns out to be the ideal node
> > configuration for Azure when affinity information is not available
> > through the vPCI. (i.e. non isolated VM sizes). This results in
> > X64 perform better than ARM64 on multiple NUMA non-isolated VM sizes.
> >
> > >
> > > Are there any observed problems on arm64 with the default being
> > > NUMA_NO_NODE? If there are such problems, they should be fixed
> > > separately since that case needs to work for a kernel built with
> CONFIG_NUMA=n.
> > > pcibus_to_node() will return NUMA_NO_NODE, making the default on
> > > x86/x64 be NUMA_NO_NODE as well.
> > >
> > > I've tested setting sysdata->node to NUMA_NO_NODE in hv_pci_probe(),
> > > and didn't see any obviously problems in an x86/x64 Azure VM with a
> > > MANA VF and multiple NVMe pass-thru devices. The NUMA node reported
> > > in /sys for these PCI devices is indeed NUMA_NO_NODE.
> > > But maybe there's some other issue that I'm not aware of.
> >
> > Extensive tests have shown defaulting NUMA node to 0 preserved the
> > existing behavior on X64, while improving performance on ARM64,
> > especially for MANA. This has been confirmed by the Hyper-V team, and
> Windows VM uses the same values for defaults.
>
> Ah, OK. That makes sense. I'd suggest doing a new version of the patch with
> the commit message and the code comment describing performance as the
> main reason for the patch. You somewhat said that in your current commit
> message, but it got muddled with the compatibility discussion, and the code
> comment just mentions compatibility. Compatibility between x86/x64 and
> arm64 isn't really the issue. The idea is that hv_pci_assign_numa_node()
> should
> always set the NUMA node to something, rather than depending on the default,
> which might be NUMA_NO_NODE. If the Hyper-V host provides a NUMA node,
> use that. But if not, use node 0 because that is usually where the underlying
> hardware actually has the physical device attached. Node 0 might not be right
> in certain situations, but if Hyper-V doesn't provide more information to the
> guest, guessing node 0 is better than letting the Linux kernel do something
> like
> load balancing across NUMA nodes, which could happen with
> NUMA_NO_NODE. (At least, that's what I think happens!)
>
> Michael
I'm adding the performance part to the commit message in v2. The compatibility
part is still valid in that we want the consistent kernel behavior on X64 and
on ARM.
Long
>
> >
> > Thanks,
> >
> > Long
> >
> > >
> > > Michael
> > >
> > > >
> > > > Fixes: 999dd956d838 ("PCI: hv: Add support for protocol 1.3 and
> > > > support PCI_BUS_RELATIONS2")
> > > > Signed-off-by: Long Li <[email protected]>
> > > > ---
> > > > drivers/pci/controller/pci-hyperv.c | 3 +++
> > > > 1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/controller/pci-hyperv.c
> > > > b/drivers/pci/controller/pci-hyperv.c
> > > > index 2c7a406b4ba8..5c03b6e4cdab 100644
> > > > --- a/drivers/pci/controller/pci-hyperv.c
> > > > +++ b/drivers/pci/controller/pci-hyperv.c
> > > > @@ -2485,6 +2485,9 @@ static void hv_pci_assign_numa_node(struct
> hv_pcibus_device *hbus)
> > > > if (!hv_dev)
> > > > continue;
> > > >
> > > > + /* Default to node 0 for consistent behavior across
> architectures */
> > > > + set_dev_node(&dev->dev, 0);
> > > > +
> > > > if (hv_dev->desc.flags &
> HV_PCI_DEVICE_FLAG_NUMA_AFFINITY &&
> > > > hv_dev->desc.virtual_numa_node <
> > > > num_possible_nodes())
> > > > /*
> > > > --
> > > > 2.43.0
> > > >
> >