From: Long Li <[email protected]> Sent: Thursday, March 12, 2026 3:33 PM
>
> When a Hyper-V PCI device does not have
> HV_PCI_DEVICE_FLAG_NUMA_AFFINITY set or has an out-of-range
> virtual_numa_node, hv_pci_assign_numa_node() leaves the device
> NUMA node unset. On x86_64, the default NUMA node happens to be
> 0, but on ARM64 it is NUMA_NO_NODE (-1), leading to inconsistent
> behavior across architectures.
>
> In Azure, when no NUMA information is available from the host,
> devices perform best when assigned to node 0. Set the device NUMA
> node to 0 unconditionally before the conditional NUMA affinity
> check, so that devices always get a valid default and behavior is
> consistent on both x86_64 and ARM64.
I'm wondering if this is the right overall approach to the inconsistency.
Arguably, the arm64 value of NUMA_NO_NODE is more correct when the
Hyper-V host has not provided any NUMA information to the guest. Maybe
the x86/x64 side should be changed to default to NUMA_NO_NODE when
there's no NUMA information provided.
The observed x86/x64 default of NUMA node 0 does not come from x86/x64
architecture specific PCI code. It's a Hyper-V specific behavior due to how
hv_pci_probe() allocates the struct hv_pcibus_device, with its embedded
struct pci_sysdata. That struct pci_sysdata has a "node" field that the x86/x64
__pcibus_to_node() function accesses when called from pci_device_add().
If hv_pci_probe() were to initialize that "node" field to NUMA_NO_NODE at
the same time that it sets the "domain" field, x86/x64 guests on Hyper-V
would see the PCI device NUMA node default to NUMA_NO_NODE like on
arm64. The current behavior of letting the sysdata "node" field stay zero
as allocated might just be an historical oversight that no one noticed.
Are there any observed problems on arm64 with the default being
NUMA_NO_NODE? If there are such problems, they should be fixed
separately since that case needs to work for a kernel built with
CONFIG_NUMA=n. pcibus_to_node() will return NUMA_NO_NODE,
making the default on x86/x64 be NUMA_NO_NODE as well.
I've tested setting sysdata->node to NUMA_NO_NODE in hv_pci_probe(),
and didn't see any obviously problems in an x86/x64 Azure VM with a
MANA VF and multiple NVMe pass-thru devices. The NUMA node
reported in /sys for these PCI devices is indeed NUMA_NO_NODE.
But maybe there's some other issue that I'm not aware of.
Michael
>
> Fixes: 999dd956d838 ("PCI: hv: Add support for protocol 1.3 and support
> PCI_BUS_RELATIONS2")
> Signed-off-by: Long Li <[email protected]>
> ---
> drivers/pci/controller/pci-hyperv.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pci/controller/pci-hyperv.c
> b/drivers/pci/controller/pci-hyperv.c
> index 2c7a406b4ba8..5c03b6e4cdab 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -2485,6 +2485,9 @@ static void hv_pci_assign_numa_node(struct
> hv_pcibus_device *hbus)
> if (!hv_dev)
> continue;
>
> + /* Default to node 0 for consistent behavior across
> architectures */
> + set_dev_node(&dev->dev, 0);
> +
> if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY &&
> hv_dev->desc.virtual_numa_node < num_possible_nodes())
> /*
> --
> 2.43.0
>