From: Long Li <[email protected]> Sent: Monday, March 16, 2026 10:38 AM
> 
> > Subject: [EXTERNAL] RE: [PATCH] PCI: hv: Set default NUMA node to 0 for 
> > devices
> > without affinity info
> >
> > From: Long Li <[email protected]> Sent: Thursday, March 12, 2026 3:33 PM
> > >
> > > When a Hyper-V PCI device does not have
> > > HV_PCI_DEVICE_FLAG_NUMA_AFFINITY set or has an out-of-range
> > > virtual_numa_node, hv_pci_assign_numa_node() leaves the device NUMA
> > > node unset. On x86_64, the default NUMA node happens to be 0, but on
> > > ARM64 it is NUMA_NO_NODE (-1), leading to inconsistent behavior across
> > > architectures.
> > >
> > > In Azure, when no NUMA information is available from the host, devices
> > > perform best when assigned to node 0. Set the device NUMA node to 0
> > > unconditionally before the conditional NUMA affinity check, so that
> > > devices always get a valid default and behavior is consistent on both
> > > x86_64 and ARM64.
> >
> > I'm wondering if this is the right overall approach to the inconsistency.
> > Arguably, the arm64 value of NUMA_NO_NODE is more correct when the Hyper-
> > V host has not provided any NUMA information to the guest. Maybe the x86/x64
> > side should be changed to default to NUMA_NO_NODE when there's no NUMA
> > information provided.
> 
> Tests have shown when Azure doesn't provide NUMA information for a PCI device,
> workloads runs best when the node defaults to 0. NUMA_NO_NODE results in
> performance degradation on ARM64. This affects most high-performance devices 
> like
> MANA when tested to line limit.
> 
> >
> > The observed x86/x64 default of NUMA node 0 does not come from x86/x64
> > architecture specific PCI code. It's a Hyper-V specific behavior due to how
> > hv_pci_probe() allocates the struct hv_pcibus_device, with its embedded 
> > struct
> > pci_sysdata. That struct pci_sysdata has a "node" field that the x86/x64
> > __pcibus_to_node() function accesses when called from pci_device_add().
> > If hv_pci_probe() were to initialize that "node" field to NUMA_NO_NODE at 
> > the
> > same time that it sets the "domain" field, x86/x64 guests on Hyper-V would 
> > see
> > the PCI device NUMA node default to NUMA_NO_NODE like on arm64. The
> > current behavior of letting the sysdata "node" field stay zero as allocated 
> > might
> > just be an historical oversight that no one noticed.
> 
> I agree this was an oversight in the original X64 code, in that it sets to 
> numa node 0 by
> chance. But it turns out to be the ideal node configuration for Azure when 
> affinity
> information is not available through the vPCI. (i.e. non isolated VM sizes). 
> This results in
> X64 perform better than ARM64 on multiple NUMA non-isolated VM sizes.
> 
> >
> > Are there any observed problems on arm64 with the default being
> > NUMA_NO_NODE? If there are such problems, they should be fixed separately
> > since that case needs to work for a kernel built with CONFIG_NUMA=n.
> > pcibus_to_node() will return NUMA_NO_NODE, making the default on x86/x64
> > be NUMA_NO_NODE as well.
> >
> > I've tested setting sysdata->node to NUMA_NO_NODE in hv_pci_probe(), and
> > didn't see any obviously problems in an x86/x64 Azure VM with a MANA VF and
> > multiple NVMe pass-thru devices. The NUMA node reported in /sys for these 
> > PCI
> > devices is indeed NUMA_NO_NODE.
> > But maybe there's some other issue that I'm not aware of.
> 
> Extensive tests have shown defaulting NUMA node to 0 preserved the existing 
> behavior
> on X64, while improving performance on ARM64, especially for MANA. This has 
> been
> confirmed by the Hyper-V team, and Windows VM uses the same values for 
> defaults.

Ah, OK.  That makes sense.  I'd suggest doing a new version of the patch with
the commit message and the code comment describing performance as the
main reason for the patch.  You somewhat said that in your current commit
message, but it got muddled with the compatibility discussion, and the code
comment just mentions compatibility. Compatibility between x86/x64 and
arm64 isn't really the issue. The idea is that hv_pci_assign_numa_node() should
always set the NUMA node to something, rather than depending on the default,
which might be NUMA_NO_NODE. If the Hyper-V host provides a NUMA node,
use that. But if not, use node 0 because that is usually where the underlying
hardware actually has the physical device attached. Node 0 might not be
right in certain situations, but if Hyper-V doesn't provide more information
to the guest, guessing node 0 is better than letting the Linux kernel do
something like load balancing across NUMA nodes, which could happen
with NUMA_NO_NODE.  (At least, that's what I think happens!)

Michael

> 
> Thanks,
> 
> Long
> 
> >
> > Michael
> >
> > >
> > > Fixes: 999dd956d838 ("PCI: hv: Add support for protocol 1.3 and support 
> > > PCI_BUS_RELATIONS2")
> > > Signed-off-by: Long Li <[email protected]>
> > > ---
> > >  drivers/pci/controller/pci-hyperv.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/drivers/pci/controller/pci-hyperv.c
> > > b/drivers/pci/controller/pci-hyperv.c
> > > index 2c7a406b4ba8..5c03b6e4cdab 100644
> > > --- a/drivers/pci/controller/pci-hyperv.c
> > > +++ b/drivers/pci/controller/pci-hyperv.c
> > > @@ -2485,6 +2485,9 @@ static void hv_pci_assign_numa_node(struct 
> > > hv_pcibus_device *hbus)
> > >           if (!hv_dev)
> > >                   continue;
> > >
> > > +         /* Default to node 0 for consistent behavior across 
> > > architectures */
> > > +         set_dev_node(&dev->dev, 0);
> > > +
> > >           if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY &&
> > >               hv_dev->desc.virtual_numa_node < num_possible_nodes())
> > >                   /*
> > > --
> > > 2.43.0
> > >
> 


Reply via email to