Brice,

Thanks for the information! It’s good to know it wasn’t a flaw in the upgrade. 
This bug must have been introduced in kernel 3.x. I ran lstopo on on of our 
servers that still have Centos 6.5 and it correctly reports L3 cache for every 
6 cores as shown below.

We have 75 servers with the exact same specifications. I have only upgraded two 
when I came across this problem during testing. Since I have a correct map on 
the non-upgraded servers, can I use that map on the upgraded servers somehow? 
Essentially hard code it?

----------------------- FROM Centos 6.5 -----------------------
  Socket L#0 (P#0 total=134215604KB CPUModel="AMD Opteron(tm) Processor 6344    
             " CPUType=x86_64)
    NUMANode L#0 (P#0 local=67106740KB total=67106740KB)
      L3Cache L#0 (size=6144KB linesize=64 ways=64)
        L2Cache L#0 (size=2048KB linesize=64 ways=16)
          L1iCache L#0 (size=64KB linesize=64 ways=2)
            L1dCache L#0 (size=16KB linesize=64 ways=4)
              Core L#0 (P#0)
                PU L#0 (P#0)
            L1dCache L#1 (size=16KB linesize=64 ways=4)
              Core L#1 (P#1)
                PU L#1 (P#1)
        L2Cache L#1 (size=2048KB linesize=64 ways=16)
          L1iCache L#1 (size=64KB linesize=64 ways=2)
            L1dCache L#2 (size=16KB linesize=64 ways=4)
              Core L#2 (P#2)
                PU L#2 (P#2)
            L1dCache L#3 (size=16KB linesize=64 ways=4)
              Core L#3 (P#3)
                PU L#3 (P#3)
        L2Cache L#2 (size=2048KB linesize=64 ways=16)
          L1iCache L#2 (size=64KB linesize=64 ways=2)
            L1dCache L#4 (size=16KB linesize=64 ways=4)
              Core L#4 (P#4)
                PU L#4 (P#4)
            L1dCache L#5 (size=16KB linesize=64 ways=4)
              Core L#5 (P#5)
                PU L#5 (P#5)

> On Jan 7, 2016, at 1:22 AM, Brice Goglin <brice.gog...@inria.fr> wrote:
> 
> Hello
> 
> This is a kernel bug for 12-core AMD Bulldozer/Piledriver (62xx/63xx) 
> processors. hwloc is just complaining about buggy L3 information. lstopo 
> should report one L3 above each set of 6 cores below each NUMA node. Instead 
> you get strange L3s with 2, 4 or 6 cores.
> 
> If you're not binding tasks based on L3 locality and if your applications do 
> not care about L3, you can pass HWLOC_HIDE_ERRORS=1 in the environment to 
> hide the message.
> 
> AMD was working on a kernel patch but it doesn't seem to be in the upstream 
> Linux yet. In hwloc v1.11.2, you can workaround the problem by passing 
> HWLOC_COMPONENTS=x86 in the environment.
> 
> I am not sure why CentOS 6.5 didn't complain. That 2.6.32 kernel should be 
> buggy too, and old hwloc releases already complained about such bugs.
> 
> thanks
> Brice
> 
> 
> 
> 
> 
> 
> Le 07/01/2016 04:10, David Winslow a écrit :
>> I upgraded our servers from Centos 6.5 to Centos7.2. Since then, when I run 
>> mpirun I get the following error but the software continues to run and it 
>> appears to work fine.
>> 
>> * hwloc 1.11.0rc3-git has encountered what looks like an error from the 
>> operating system.
>> *
>> * L3 (cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset 0x0000003f) 
>> without inclusion!
>> * Error occurred in topology.c line 983
>> *
>> * The following FAQ entry in the hwloc documentation may help:
>> *   What should I do when hwloc reports "operating system" warnings?
>> * Otherwise please report this error message to the hwloc user's mailing 
>> list,
>> * along with the output+tarball generated by the hwloc-gather-topology 
>> script.
>> 
>> I can replicate the error by simply running hwloc-info.
>> 
>> The version of hwloc used with mpirun is 1.9. The version installed on the 
>> server that I ran is 1.7 that comes with Centos 7. They both give the error 
>> with minor differences shown below.
>> 
>> With hwloc 1.7
>> * object (L3 cpuset 0x000003f0) intersection without inclusion!
>> * Error occurred in topology.c line 753
>> 
>> With hwloc 1.9
>> * L3 (cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset 0x0000003f) 
>> without inclusion!
>> * Error occurred in topology.c line 983
>> 
>> The current kernel is 3.10.0-327.el7.x86_64. I’ve tried updating the kernel 
>> to a minor release update and even tried to install kernel v4.4.3. None of 
>> the kernels worked. Again, hwloc works fine in Centos 6.5 with kernel 
>> 2.6.32-431.29.2.el6.x86_64.
>> 
>> I’ve attached the files generated by hwloc-gather-topology.sh.  I compared 
>> what this script says is the expected output to the actual output and, from 
>> what I can tell, they look the same. Maybe I’m missing something after 
>> staring all day at the information.
>> 
>> I did a clean install of the OS to perform the upgrade from 6.5.
>> 
>> I’ve attached the results of the hwloc-gather-topology.sh script. Any help 
>> will be greatly appreciated.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> hwloc-users mailing list
>> 
>> hwloc-us...@open-mpi.org
>> 
>> Subscription: 
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>> 
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1238.php
> 
> _______________________________________________
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> Link to this post: 
> http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1240.php

Reply via email to