* Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com> [2020-08-17 17:04:24]:

> On 8/17/20 4:29 PM, Srikar Dronamraju wrote:
> > * Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com> [2020-08-17 16:02:36]:
> > 
> > > We use ibm,associativity and ibm,associativity-lookup-arrays to derive 
> > > the numa
> > > node numbers. These device tree properties are firmware indicated 
> > > grouping of
> > > resources based on their hierarchy in the platform. These numbers (group 
> > > id) are
> > > not sequential and hypervisor/firmware can follow different numbering 
> > > schemes.
> > > For ex: on powernv platforms, we group them in the below order.
> > > 
> > >   *     - CCM node ID
> > >   *     - HW card ID
> > >   *     - HW module ID
> > >   *     - Chip ID
> > >   *     - Core ID
> > > 
> > > Based on ibm,associativity-reference-points we use one of the above group 
> > > ids as
> > > Linux NUMA node id. (On PowerNV platform Chip ID is used). This results
> > > in Linux reporting non-linear NUMA node id and which also results in Linux
> > > reporting empty node 0 NUMA nodes.
> > > 
> > > This can  be resolved by mapping the firmware provided group id to a 
> > > logical Linux
> > > NUMA id. In this patch, we do this only for pseries platforms considering 
> > > the
> > > firmware group id is a virtualized entity and users would not have drawn 
> > > any
> > > conclusion based on the Linux Numa Node id.
> > > 
> > > On PowerNV platform since we have historically mapped Chip ID as Linux 
> > > NUMA node
> > > id, we keep the existing Linux NUMA node id numbering.
> > 
> > I still dont understand how you are going to handle numa distances.
> > With your patch, have you tried dlpar add/remove on a sparsely noded 
> > machine?
> > 
> 
> We follow the same steps when fetching distance information. Instead of
> using affinity domain id, we now use the mapped node id. The relevant hunk
> in the patch is
> 
> +     nid = affinity_domain_to_nid(&domain);
> 
>       if (nid > 0 &&
> -             of_read_number(associativity, 1) >= distance_ref_points_depth) {
> +         of_read_number(associativity, 1) >= distance_ref_points_depth) {
>               /*
>                * Skip the length field and send start of associativity array
>                */
> 
> I haven't tried dlpar add/remove. I don't have a setup to try that. Do you
> see a problem there?
> 

Yes, I think there can be 2 problems.

1. distance table may be filled with incorrect data.
2. numactl -H distance table shows symmetric data, the symmetric nature may
be lost.

> -aneesh
> 
> 

-- 
Thanks and Regards
Srikar Dronamraju

Reply via email to