Hi Brice,
thanks a lot for the quick response!
I have tested the patch and it works just fine:-) [1]
I am trying to release hwloc 2.5 "soon". If that's too slow, please let me
> know, I'll see if I can do a 2.4.1 earlier.
There is no rush, 2.5 sounds great.
Merci beaucoup!
Jirka
[1]
$ ./utils/lstopo/lstopo-no-graphics
Machine (7615MB total)
Package L#0
NUMANode L#0 (P#2 7615MB)
L3 L#0 (4096KB) + L2 L#0 (1024KB) + Core L#0
L1d L#0 (32KB) + L1i L#0 (48KB)
PU L#0 (P#0)
PU L#1 (P#2)
PU L#2 (P#4)
PU L#3 (P#6)
L1d L#1 (32KB) + L1i L#1 (48KB)
PU L#4 (P#1)
PU L#5 (P#3)
PU L#6 (P#5)
PU L#7 (P#7)
Block(Disk) "sda"
Net "env2"
On Mon, Apr 26, 2021 at 8:43 PM Brice Goglin <[email protected]> wrote:
> This patch should fix the issue. We had to fix the same issue for CPU#0
> being offline recently but I didn't know it could be needed for NUMA node#0
> being offline too.
>
> I am trying to release hwloc 2.5 "soon". If that's too slow, please let me
> know, I'll see if I can do a 2.4.1 earlier.
>
> Brice
>
>
>
>
> commit 7c159d723432e461b4e48cc2d38212913d2ba7c7
> Author: Brice Goglin <[email protected]> <[email protected]>
> Date: Mon Apr 26 20:35:42 2021 +0200
>
> linux: fix support for NUMA node0 being oddline
>
> Just like we didn't support offline CPU#0 until commit
> 7bcc273efd50536961ba16d474efca4ae163229b, we need to
> support node0 being offline as well.
> It's not clear whether it's a new Linux feature or not,
> this was reported on a POWER LPAR VM.
>
> We opportunistically assume node0 is online to avoid
> the overhead in the vast majority of cases. If node0
> is missing, we parse "online" to find the first node.
>
> Thanks to Jirka Hladky for the report.
>
> Signed-off-by: Brice Goglin <[email protected]>
> <[email protected]>
>
> diff --git a/hwloc/topology-linux.c b/hwloc/topology-linux.c
> index 94b242dd0..10e038e64 100644
> --- a/hwloc/topology-linux.c
> +++ b/hwloc/topology-linux.c
> @@ -5264,6 +5264,9 @@ static const char *find_sysfs_cpu_path(int root_fd,
> int *old_filenames)
>
> static const char *find_sysfs_node_path(int root_fd)
> {
> + unsigned first;
> + int err;
> +
> if (!hwloc_access("/sys/bus/node/devices", R_OK|X_OK, root_fd)
> && !hwloc_access("/sys/bus/node/devices/node0/cpumap", R_OK, root_fd))
> return "/sys/bus/node/devices";
> @@ -5272,6 +5275,28 @@ static const char *find_sysfs_node_path(int root_fd)
> && !hwloc_access("/sys/devices/system/node/node0/cpumap", R_OK,
> root_fd))
> return "/sys/devices/system/node";
>
> + /* node0 might be offline, fallback to looking at the first online node.
> + * online contains comma-separated ranges, just read the first number.
> + */
> + hwloc_debug("Failed to find sysfs node files using node0, looking at
> online nodes...\n");
> + err = hwloc_read_path_as_uint("/sys/devices/system/node/online", &first,
> root_fd);
> + if (err) {
> + hwloc_debug("Failed to find read /sys/devices/system/node/online.\n");
> + } else {
> + char path[PATH_MAX];
> + hwloc_debug("Found node#%u as first online node\n", first);
> +
> + snprintf(path, sizeof(path), "/sys/bus/node/devices/node%u/cpumap",
> first);
> + if (!hwloc_access("/sys/bus/node/devices", R_OK|X_OK, root_fd)
> + && !hwloc_access(path, R_OK, root_fd))
> + return "/sys/bus/node/devices";
> +
> + snprintf(path, sizeof(path), "/sys/devices/system/node/node%u/cpumap",
> first);
> + if (!hwloc_access("/sys/devices/system/node", R_OK|X_OK, root_fd)
> + && !hwloc_access(path, R_OK, root_fd))
> + return "/sys/devices/system/node";
> + }
> +
> return NULL;
> }
>
>
>
>
>
>
>
> Le 26/04/2021 à 16:48, Brice Goglin a écrit :
>
> Hello,
>
> Maybe we have something that assumes that the first NUMA node on Linux is
> #0. And something is wrong in the disallowed case anyway since the NUMA
> node physical number is 0 instead of 2 there.
>
> Can you run "hwloc-gather-topology lpar" and send the resulting
> lpar.tar.bz2? (send it only to me if it's too big or somehow confidential).
>
> Thanks
>
> Brice
>
>
>
> Le 26/04/2021 à 16:40, Jirka Hladky a écrit :
>
> Hi Brice,
>
> how are you doing? I hope you are fine. We are all well and safe.
>
> I have been running hwloc on IBM Power LPAR VM with only 1 CPU core and 8
> PUs [1]. There is only one NUMA node. The numbering is however quite
> strange, the NUMA node number is "2". See [2].
>
> hwloc reports "Topology does not contain any NUMA node, aborting!"
>
> $ lstopo
> Topology does not contain any NUMA node, aborting!
> hwloc_topology_load() failed (No such file or directory).
>
> Could you please double-check if this behavior is correct? I believe hwloc
> should work on this HW setup.
>
> FYI, we can get it working with --disallowed option [3] (but I think it
> should work without this option as well)
>
> Thanks a lot!
> Jirka
>
>
> [1] $ lscpu
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 8
> On-line CPU(s) list: 0-7
> Thread(s) per core: 8
> Core(s) per socket: 1
> Socket(s): 1
> NUMA node(s): 1
>
> [2] There is ONE NUMA node with the number "2":
> $ numactl -H
> available: 1 nodes (2)
> node 2 cpus: 0 1 2 3 4 5 6 7
> node 2 size: 7614 MB
> node 2 free: 1098 MB
> node distances:
> node 2
> 2: 10
>
> [3]
> $ lstopo --disallowed
>
> Machine (7615MB total)
> Package L#0
> NUMANode L#0 (P#0 7615MB)
> L3 L#0 (4096KB) + L2 L#0 (1024KB) + Core L#0
> L1d L#0 (32KB) + L1i L#0 (48KB)
> Die L#0 + PU L#0 (P#0)
> PU L#1 (P#2)
> PU L#2 (P#4)
> PU L#3 (P#6)
> L1d L#1 (32KB) + L1i L#1 (48KB)
> PU L#4 (P#1)
> PU L#5 (P#3)
> PU L#6 (P#5)
> PU L#7 (P#7)
> Block(Disk) "sda"
> Net "env2"
>
>
>
>
> _______________________________________________
> hwloc-devel mailing
> [email protected]https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
>
>
> _______________________________________________
> hwloc-devel mailing
> [email protected]https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
>
> _______________________________________________
> hwloc-devel mailing list
> [email protected]
> https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
--
-Jirka
_______________________________________________
hwloc-devel mailing list
[email protected]
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel