Brice, Thanks for the information! It’s good to know it wasn’t a flaw in the upgrade. This bug must have been introduced in kernel 3.x. I ran lstopo on on of our servers that still have Centos 6.5 and it correctly reports L3 cache for every 6 cores as shown below.
We have 75 servers with the exact same specifications. I have only upgraded two when I came across this problem during testing. Since I have a correct map on the non-upgraded servers, can I use that map on the upgraded servers somehow? Essentially hard code it? ----------------------- FROM Centos 6.5 ----------------------- Socket L#0 (P#0 total=134215604KB CPUModel="AMD Opteron(tm) Processor 6344 " CPUType=x86_64) NUMANode L#0 (P#0 local=67106740KB total=67106740KB) L3Cache L#0 (size=6144KB linesize=64 ways=64) L2Cache L#0 (size=2048KB linesize=64 ways=16) L1iCache L#0 (size=64KB linesize=64 ways=2) L1dCache L#0 (size=16KB linesize=64 ways=4) Core L#0 (P#0) PU L#0 (P#0) L1dCache L#1 (size=16KB linesize=64 ways=4) Core L#1 (P#1) PU L#1 (P#1) L2Cache L#1 (size=2048KB linesize=64 ways=16) L1iCache L#1 (size=64KB linesize=64 ways=2) L1dCache L#2 (size=16KB linesize=64 ways=4) Core L#2 (P#2) PU L#2 (P#2) L1dCache L#3 (size=16KB linesize=64 ways=4) Core L#3 (P#3) PU L#3 (P#3) L2Cache L#2 (size=2048KB linesize=64 ways=16) L1iCache L#2 (size=64KB linesize=64 ways=2) L1dCache L#4 (size=16KB linesize=64 ways=4) Core L#4 (P#4) PU L#4 (P#4) L1dCache L#5 (size=16KB linesize=64 ways=4) Core L#5 (P#5) PU L#5 (P#5) > On Jan 7, 2016, at 1:22 AM, Brice Goglin <brice.gog...@inria.fr> wrote: > > Hello > > This is a kernel bug for 12-core AMD Bulldozer/Piledriver (62xx/63xx) > processors. hwloc is just complaining about buggy L3 information. lstopo > should report one L3 above each set of 6 cores below each NUMA node. Instead > you get strange L3s with 2, 4 or 6 cores. > > If you're not binding tasks based on L3 locality and if your applications do > not care about L3, you can pass HWLOC_HIDE_ERRORS=1 in the environment to > hide the message. > > AMD was working on a kernel patch but it doesn't seem to be in the upstream > Linux yet. In hwloc v1.11.2, you can workaround the problem by passing > HWLOC_COMPONENTS=x86 in the environment. > > I am not sure why CentOS 6.5 didn't complain. That 2.6.32 kernel should be > buggy too, and old hwloc releases already complained about such bugs. > > thanks > Brice > > > > > > > Le 07/01/2016 04:10, David Winslow a écrit : >> I upgraded our servers from Centos 6.5 to Centos7.2. Since then, when I run >> mpirun I get the following error but the software continues to run and it >> appears to work fine. >> >> * hwloc 1.11.0rc3-git has encountered what looks like an error from the >> operating system. >> * >> * L3 (cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset 0x0000003f) >> without inclusion! >> * Error occurred in topology.c line 983 >> * >> * The following FAQ entry in the hwloc documentation may help: >> * What should I do when hwloc reports "operating system" warnings? >> * Otherwise please report this error message to the hwloc user's mailing >> list, >> * along with the output+tarball generated by the hwloc-gather-topology >> script. >> >> I can replicate the error by simply running hwloc-info. >> >> The version of hwloc used with mpirun is 1.9. The version installed on the >> server that I ran is 1.7 that comes with Centos 7. They both give the error >> with minor differences shown below. >> >> With hwloc 1.7 >> * object (L3 cpuset 0x000003f0) intersection without inclusion! >> * Error occurred in topology.c line 753 >> >> With hwloc 1.9 >> * L3 (cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset 0x0000003f) >> without inclusion! >> * Error occurred in topology.c line 983 >> >> The current kernel is 3.10.0-327.el7.x86_64. I’ve tried updating the kernel >> to a minor release update and even tried to install kernel v4.4.3. None of >> the kernels worked. Again, hwloc works fine in Centos 6.5 with kernel >> 2.6.32-431.29.2.el6.x86_64. >> >> I’ve attached the files generated by hwloc-gather-topology.sh. I compared >> what this script says is the expected output to the actual output and, from >> what I can tell, they look the same. Maybe I’m missing something after >> staring all day at the information. >> >> I did a clean install of the OS to perform the upgrade from 6.5. >> >> I’ve attached the results of the hwloc-gather-topology.sh script. Any help >> will be greatly appreciated. >> >> >> >> >> >> >> >> >> _______________________________________________ >> hwloc-users mailing list >> >> hwloc-us...@open-mpi.org >> >> Subscription: >> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >> >> Link to this post: >> http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1238.php > > _______________________________________________ > hwloc-users mailing list > hwloc-us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > Link to this post: > http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1240.php