Dear hwloc experts,

Using hwloc 1.11.13 I receive an "incorrect PCI locality information" error message. The complete message is attached as file "lstopo_1.11.13.err".

I get this error on a dual socket Xeon Platinum 9242 system running CentOS 7.8.

I don't see this error on a dual socket Xeon Gold 6148 system running the same CentOS release (7.8).

And if I remember correctly, I also did not see that error earlier with our dual socket Xeon Platinum 9242 system before it was updated to version 7.8 of CentOS.

So to me it is the combination of that specific CentOS release (7.8) and that particular CPU type (Xeon Platinum 9242) which triggers the error in hwloc 1.11.13.

With hwloc 2.1.0, however, I do not see any error message. For your reference, I am attaching the XML output files obtained from hwloc 1.11.13 and 2.1.0.

Unfortunately, I cannot switch from hwloc 1.x to 2.x because I need to compile OpenMPI 3.x where hwloc 1.x is required. And simply setting HWLOC_HIDE_ERRORS is not a true solution.

Could someone please provide a fix for this particular problem in hwloc 1.x?


Thank you in advance -
Christian Tuma

--
Dr. Christian Tuma
Consultant, Supercomputing
Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany
+49 30 84185132 | t...@zib.de | www.zib.de
****************************************************************************
* hwloc 1.11.13 has encountered an incorrect PCI locality information.
* PCI bus 0000:40 is supposedly close to 2nd NUMA node of 1st package,
* however hwloc believes this is impossible on this architecture.
* Therefore the PCI bus will be moved to 1st NUMA node of 2nd package.
*
* If you feel this fixup is wrong, disable it by setting in your environment
* HWLOC_PCI_0000_40_LOCALCPUS= (empty value), and report the problem
* to the hwloc's user mailing list together with the XML output of lstopo.
*
* You may silence this message by setting HWLOC_HIDE_ERRORS=1 in your 
environment.
****************************************************************************
****************************************************************************
* hwloc 1.11.13 has encountered an incorrect PCI locality information.
* PCI bus 0000:44 is supposedly close to 2nd NUMA node of 1st package,
* however hwloc believes this is impossible on this architecture.
* Therefore the PCI bus will be moved to 1st NUMA node of 2nd package.
*
* If you feel this fixup is wrong, disable it by setting in your environment
* HWLOC_PCI_0000_44_LOCALCPUS= (empty value), and report the problem
* to the hwloc's user mailing list together with the XML output of lstopo.
*
* You may silence this message by setting HWLOC_HIDE_ERRORS=1 in your 
environment.
****************************************************************************
****************************************************************************
* hwloc 1.11.13 has encountered an incorrect PCI locality information.
* PCI bus 0000:53 is supposedly close to 2nd NUMA node of 1st package,
* however hwloc believes this is impossible on this architecture.
* Therefore the PCI bus will be moved to 1st NUMA node of 2nd package.
*
* If you feel this fixup is wrong, disable it by setting in your environment
* HWLOC_PCI_0000_53_LOCALCPUS= (empty value), and report the problem
* to the hwloc's user mailing list together with the XML output of lstopo.
*
* You may silence this message by setting HWLOC_HIDE_ERRORS=1 in your 
environment.
****************************************************************************
****************************************************************************
* hwloc 1.11.13 has encountered an incorrect PCI locality information.
* PCI bus 0000:62 is supposedly close to 2nd NUMA node of 1st package,
* however hwloc believes this is impossible on this architecture.
* Therefore the PCI bus will be moved to 1st NUMA node of 2nd package.
*
* If you feel this fixup is wrong, disable it by setting in your environment
* HWLOC_PCI_0000_62_LOCALCPUS= (empty value), and report the problem
* to the hwloc's user mailing list together with the XML output of lstopo.
*
* You may silence this message by setting HWLOC_HIDE_ERRORS=1 in your 
environment.
****************************************************************************
****************************************************************************
* hwloc 1.11.13 has encountered an incorrect PCI locality information.
* PCI bus 0000:71 is supposedly close to 2nd NUMA node of 1st package,
* however hwloc believes this is impossible on this architecture.
* Therefore the PCI bus will be moved to 1st NUMA node of 2nd package.
*
* If you feel this fixup is wrong, disable it by setting in your environment
* HWLOC_PCI_0000_71_LOCALCPUS= (empty value), and report the problem
* to the hwloc's user mailing list together with the XML output of lstopo.
*
* You may silence this message by setting HWLOC_HIDE_ERRORS=1 in your 
environment.
****************************************************************************

Attachment: lstopo_1.11.13.xml.gz
Description: GNU Zip compressed data

Attachment: lstopo_2.1.0.xml.gz
Description: GNU Zip compressed data

_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Reply via email to