Hi Dan,

> -----Original Message-----
> From: Dan Williams <dan.j.willi...@intel.com>
> Sent: Friday, September 10, 2021 11:42 PM
> To: Justin He <justin...@arm.com>
> Cc: Vishal Verma <vishal.l.ve...@intel.com>; Dave Jiang
> <dave.ji...@intel.com>; David Hildenbrand <da...@redhat.com>; Linux NVDIMM
> <nvd...@lists.linux.dev>; Linux Kernel Mailing List <linux-
> ker...@vger.kernel.org>
> Subject: Re: [PATCH v2] device-dax: use fallback nid when numa node is
> invalid
> 
> On Fri, Sep 10, 2021 at 5:46 AM Jia He <justin...@arm.com> wrote:
> >
> > Previously, numa_off was set unconditionally in dummy_numa_init()
> > even with a fake numa node. Then ACPI sets node id as NUMA_NO_NODE(-1)
> > after acpi_map_pxm_to_node() because it regards numa_off as turning
> > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on
> > arm64 with fake numa case.
> >
> > Without this patch, pmem can't be probed as RAM devices on arm64 if
> > SRAT table isn't present:
> >   $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g
> -a 64K
> >   kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with
> invalid node: -1
> >   kmem: probe of dax0.0 failed with error -22
> >
> > This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
> >
> > Suggested-by: David Hildenbrand <da...@redhat.com>
> > Signed-off-by: Jia He <justin...@arm.com>
> > ---
> > v2: - rebase it based on David's "memory group" patch.
> >     - drop the changes in dev_dax_kmem_remove() since nid had been
> >       removed in remove_memory().
> >  drivers/dax/kmem.c | 31 +++++++++++++++++--------------
> >  1 file changed, 17 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > index a37622060fff..e4836eb7539e 100644
> > --- a/drivers/dax/kmem.c
> > +++ b/drivers/dax/kmem.c
> > @@ -47,20 +47,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> >         unsigned long total_len = 0;
> >         struct dax_kmem_data *data;
> >         int i, rc, mapped = 0;
> > -       int numa_node;
> > -
> > -       /*
> > -        * Ensure good NUMA information for the persistent memory.
> > -        * Without this check, there is a risk that slow memory
> > -        * could be mixed in a node with faster memory, causing
> > -        * unavoidable performance issues.
> > -        */
> > -       numa_node = dev_dax->target_node;
> > -       if (numa_node < 0) {
> > -               dev_warn(dev, "rejecting DAX region with invalid
> node: %d\n",
> > -                               numa_node);
> > -               return -EINVAL;
> > -       }
> > +       int numa_node = dev_dax->target_node;
> >
> >         for (i = 0; i < dev_dax->nr_range; i++) {
> >                 struct range range;
> > @@ -71,6 +58,22 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> >                                         i, range.start, range.end);
> >                         continue;
> >                 }
> > +
> > +               /*
> > +                * Ensure good NUMA information for the persistent
> memory.
> > +                * Without this check, there is a risk but not fatal
> that slow
> > +                * memory could be mixed in a node with faster memory,
> causing
> > +                * unavoidable performance issues. Warn this and use
> fallback
> > +                * node id.
> > +                */
> > +               if (numa_node < 0) {
> > +                       int new_node =
> memory_add_physaddr_to_nid(range.start);
> > +
> > +                       dev_info(dev, "changing nid from %d to %d for
> DAX region [%#llx-%#llx]\n",
> > +                                numa_node, new_node, range.start,
> range.end);
> > +                       numa_node = new_node;
> > +               }
> > +
> >                 total_len += range_len(&range);
> 
> This fallback change belongs where the parent region for the namespace
> adopts its target_node, because it's not clear
> memory_add_physaddr_to_nid() is the right fallback in all situations.
> Here is where this setting is happening currently:
> 
> drivers/acpi/nfit/core.c:3004:          ndr_desc->target_node =
> pxm_to_node(spa->proximity_domain);
On my local arm64 guest('virt' machine type), the target_node is
set to -1 at this line.
That is:
The condition "spa->flags & ACPI_NFIT_PROXIMITY_VALID" is hit.

> drivers/acpi/nfit/core.c:3007:          ndr_desc->target_node =
> NUMA_NO_NODE;
> drivers/nvdimm/e820.c:29:       ndr_desc.target_node = nid;
> drivers/nvdimm/of_pmem.c:58:            ndr_desc.target_node =
> ndr_desc.numa_node;
> drivers/nvdimm/region_devs.c:1127:      nd_region->target_node =
> ndr_desc->target_node;


Sorry,Dan. I thought I missed your previous mail:

=========================================
Looks like it is the NFIT driver, thanks.

If you're getting NUMA_NO_NODE in dax_kmem from the NFIT driver in
means your ACPI NFIT table is failing to populate correct numa
information. You could try the following to fix it up, but I think the
real problem is that your platform BIOS needs to add the proper numa
data.

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index fb775b967c52..d3a0cec635b1 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -3005,15 +3005,8 @@ static int acpi_nfit_register_region(struct
acpi_nfit_desc *acpi_desc,
        ndr_desc->res = &res;
        ndr_desc->provider_data = nfit_spa;
        ndr_desc->attr_groups = acpi_nfit_region_attribute_groups;
-       if (spa->flags & ACPI_NFIT_PROXIMITY_VALID) {
-               ndr_desc->numa_node = acpi_map_pxm_to_online_node(
-                                               spa->proximity_domain);
-               ndr_desc->target_node = acpi_map_pxm_to_node(
-                               spa->proximity_domain);
-       } else {
-               ndr_desc->numa_node = NUMA_NO_NODE;
-               ndr_desc->target_node = NUMA_NO_NODE;
-       }
+       ndr_desc->numa_node = memory_add_physaddr_to_nid(spa->address);
+       ndr_desc->target_node = phys_to_target_node(spa->address);

        /*
         * Persistence domain bits are hierarchical, if
===================================================

Do you still suggest fixing like this?


--
Cheers,
Justin (Jia He)



Reply via email to