Wait, I'm sorry, I must be missing something, please bear with me!

By the way, your discussion of groups 1 and 2 below is wrong. Group 2 doesn't 
say that NUMA node == socket, and it doesn't report 8 sockets of 8 cores each. 
It reports 4 sockets containing 2 NUMA nodes each containing 8 cores each, and 
that's likely what you have here (AMD Opteron 6300 or 6200 processors?).
Output of lstopo from nodes of both BIOS versions seem to indicate that there 
are 4 sockets, but slurm is reporting on numa nodes, no?  If not, which version 
of the BIOS is correct?  


SocketsPerBoard=4:8(hw) CoresPerSocket=16:8(hw)
>>>This message indicates that slurm believes the hardware actually has 8 
>>>sockets and 8 cores per socket no?
>>>

Complete lstopo info attached for clarity for group 1 and 2.  

If there is a problem with the BIOS I'd like to correct it so please let me 
know if the BIOS is actually at fault here.  

Thanks!

Craig


On Wednesday, May 28, 2014 4:01 PM, Brice Goglin <brice.gog...@inria.fr> wrote:
 


Le 28/05/2014 14:57, Craig Kapfer a écrit :


>
>
>Hmm ... the slurm config defines that all nodes have 4
                sockets with 16 cores per socket (which corresponds to
                the hardware--all nodes are the same).   Slurm node
                config is as follows:
>
>
>NodeName=n[001-008] RealMemory=258452 Sockets=4 CoresPerSocket=16 
>ThreadsPerCore=1 State=UNKNOWN Port=[17001-17008]
>
>
>But we get this error--so I suspect it's a parsing error on the slurm side?
No, it's slurm properly reading info from hwloc, but that info
    doesn't match the actual hardware because the BIOS is buggy.


Brice
Machine (128GB)
  NUMANode L#0 (P#0 32GB) + Socket L#0
    L3 L#0 (6144KB)
      L2 L#0 (2048KB) + L1i L#0 (64KB)
        L1d L#0 (16KB) + Core L#0 + PU L#0 (P#0)
        L1d L#1 (16KB) + Core L#1 + PU L#1 (P#1)
      L2 L#1 (2048KB) + L1i L#1 (64KB)
        L1d L#2 (16KB) + Core L#2 + PU L#2 (P#2)
        L1d L#3 (16KB) + Core L#3 + PU L#3 (P#3)
      L2 L#2 (2048KB) + L1i L#2 (64KB)
        L1d L#4 (16KB) + Core L#4 + PU L#4 (P#4)
        L1d L#5 (16KB) + Core L#5 + PU L#5 (P#5)
      L2 L#3 (2048KB) + L1i L#3 (64KB)
        L1d L#6 (16KB) + Core L#6 + PU L#6 (P#6)
        L1d L#7 (16KB) + Core L#7 + PU L#7 (P#7)
    L3 L#1 (6144KB)
      L2 L#4 (2048KB) + L1i L#4 (64KB)
        L1d L#8 (16KB) + Core L#8 + PU L#8 (P#8)
        L1d L#9 (16KB) + Core L#9 + PU L#9 (P#9)
      L2 L#5 (2048KB) + L1i L#5 (64KB)
        L1d L#10 (16KB) + Core L#10 + PU L#10 (P#10)
        L1d L#11 (16KB) + Core L#11 + PU L#11 (P#11)
      L2 L#6 (2048KB) + L1i L#6 (64KB)
        L1d L#12 (16KB) + Core L#12 + PU L#12 (P#12)
        L1d L#13 (16KB) + Core L#13 + PU L#13 (P#13)
      L2 L#7 (2048KB) + L1i L#7 (64KB)
        L1d L#14 (16KB) + Core L#14 + PU L#14 (P#14)
        L1d L#15 (16KB) + Core L#15 + PU L#15 (P#15)
  NUMANode L#1 (P#2 32GB) + Socket L#1
    L3 L#2 (6144KB)
      L2 L#8 (2048KB) + L1i L#8 (64KB)
        L1d L#16 (16KB) + Core L#16 + PU L#16 (P#16)
        L1d L#17 (16KB) + Core L#17 + PU L#17 (P#17)
      L2 L#9 (2048KB) + L1i L#9 (64KB)
        L1d L#18 (16KB) + Core L#18 + PU L#18 (P#18)
        L1d L#19 (16KB) + Core L#19 + PU L#19 (P#19)
      L2 L#10 (2048KB) + L1i L#10 (64KB)
        L1d L#20 (16KB) + Core L#20 + PU L#20 (P#20)
        L1d L#21 (16KB) + Core L#21 + PU L#21 (P#21)
      L2 L#11 (2048KB) + L1i L#11 (64KB)
        L1d L#22 (16KB) + Core L#22 + PU L#22 (P#22)
        L1d L#23 (16KB) + Core L#23 + PU L#23 (P#23)
    L3 L#3 (6144KB)
      L2 L#12 (2048KB) + L1i L#12 (64KB)
        L1d L#24 (16KB) + Core L#24 + PU L#24 (P#24)
        L1d L#25 (16KB) + Core L#25 + PU L#25 (P#25)
      L2 L#13 (2048KB) + L1i L#13 (64KB)
        L1d L#26 (16KB) + Core L#26 + PU L#26 (P#26)
        L1d L#27 (16KB) + Core L#27 + PU L#27 (P#27)
      L2 L#14 (2048KB) + L1i L#14 (64KB)
        L1d L#28 (16KB) + Core L#28 + PU L#28 (P#28)
        L1d L#29 (16KB) + Core L#29 + PU L#29 (P#29)
      L2 L#15 (2048KB) + L1i L#15 (64KB)
        L1d L#30 (16KB) + Core L#30 + PU L#30 (P#30)
        L1d L#31 (16KB) + Core L#31 + PU L#31 (P#31)
  NUMANode L#2 (P#4 32GB) + Socket L#2
    L3 L#4 (6144KB)
      L2 L#16 (2048KB) + L1i L#16 (64KB)
        L1d L#32 (16KB) + Core L#32 + PU L#32 (P#32)
        L1d L#33 (16KB) + Core L#33 + PU L#33 (P#33)
      L2 L#17 (2048KB) + L1i L#17 (64KB)
        L1d L#34 (16KB) + Core L#34 + PU L#34 (P#34)
        L1d L#35 (16KB) + Core L#35 + PU L#35 (P#35)
      L2 L#18 (2048KB) + L1i L#18 (64KB)
        L1d L#36 (16KB) + Core L#36 + PU L#36 (P#36)
        L1d L#37 (16KB) + Core L#37 + PU L#37 (P#37)
      L2 L#19 (2048KB) + L1i L#19 (64KB)
        L1d L#38 (16KB) + Core L#38 + PU L#38 (P#38)
        L1d L#39 (16KB) + Core L#39 + PU L#39 (P#39)
    L3 L#5 (6144KB)
      L2 L#20 (2048KB) + L1i L#20 (64KB)
        L1d L#40 (16KB) + Core L#40 + PU L#40 (P#40)
        L1d L#41 (16KB) + Core L#41 + PU L#41 (P#41)
      L2 L#21 (2048KB) + L1i L#21 (64KB)
        L1d L#42 (16KB) + Core L#42 + PU L#42 (P#42)
        L1d L#43 (16KB) + Core L#43 + PU L#43 (P#43)
      L2 L#22 (2048KB) + L1i L#22 (64KB)
        L1d L#44 (16KB) + Core L#44 + PU L#44 (P#44)
        L1d L#45 (16KB) + Core L#45 + PU L#45 (P#45)
      L2 L#23 (2048KB) + L1i L#23 (64KB)
        L1d L#46 (16KB) + Core L#46 + PU L#46 (P#46)
        L1d L#47 (16KB) + Core L#47 + PU L#47 (P#47)
  NUMANode L#3 (P#6 32GB) + Socket L#3
    L3 L#6 (6144KB)
      L2 L#24 (2048KB) + L1i L#24 (64KB)
        L1d L#48 (16KB) + Core L#48 + PU L#48 (P#48)
        L1d L#49 (16KB) + Core L#49 + PU L#49 (P#49)
      L2 L#25 (2048KB) + L1i L#25 (64KB)
        L1d L#50 (16KB) + Core L#50 + PU L#50 (P#50)
        L1d L#51 (16KB) + Core L#51 + PU L#51 (P#51)
      L2 L#26 (2048KB) + L1i L#26 (64KB)
        L1d L#52 (16KB) + Core L#52 + PU L#52 (P#52)
        L1d L#53 (16KB) + Core L#53 + PU L#53 (P#53)
      L2 L#27 (2048KB) + L1i L#27 (64KB)
        L1d L#54 (16KB) + Core L#54 + PU L#54 (P#54)
        L1d L#55 (16KB) + Core L#55 + PU L#55 (P#55)
    L3 L#7 (6144KB)
      L2 L#28 (2048KB) + L1i L#28 (64KB)
        L1d L#56 (16KB) + Core L#56 + PU L#56 (P#56)
        L1d L#57 (16KB) + Core L#57 + PU L#57 (P#57)
      L2 L#29 (2048KB) + L1i L#29 (64KB)
        L1d L#58 (16KB) + Core L#58 + PU L#58 (P#58)
        L1d L#59 (16KB) + Core L#59 + PU L#59 (P#59)
      L2 L#30 (2048KB) + L1i L#30 (64KB)
        L1d L#60 (16KB) + Core L#60 + PU L#60 (P#60)
        L1d L#61 (16KB) + Core L#61 + PU L#61 (P#61)
      L2 L#31 (2048KB) + L1i L#31 (64KB)
        L1d L#62 (16KB) + Core L#62 + PU L#62 (P#62)
        L1d L#63 (16KB) + Core L#63 + PU L#63 (P#63)
  HostBridge L#0
    PCIBridge
      PCI 15b3:1003
    PCIBridge
      PCI 8086:10e7
        Net L#0 "eth0"
      PCI 8086:10e7
        Net L#1 "eth1"
    PCI 1002:4390
      Block L#2 "sda"
      Block L#3 "sdb"
    PCI 1002:439c
    PCIBridge
      PCI 102b:0532
Machine (256GB)
  Socket L#0 (64GB)
    NUMANode L#0 (P#0 32GB) + L3 L#0 (6144KB)
      L2 L#0 (2048KB) + L1i L#0 (64KB)
        L1d L#0 (16KB) + Core L#0 + PU L#0 (P#0)
        L1d L#1 (16KB) + Core L#1 + PU L#1 (P#1)
      L2 L#1 (2048KB) + L1i L#1 (64KB)
        L1d L#2 (16KB) + Core L#2 + PU L#2 (P#2)
        L1d L#3 (16KB) + Core L#3 + PU L#3 (P#3)
      L2 L#2 (2048KB) + L1i L#2 (64KB)
        L1d L#4 (16KB) + Core L#4 + PU L#4 (P#4)
        L1d L#5 (16KB) + Core L#5 + PU L#5 (P#5)
      L2 L#3 (2048KB) + L1i L#3 (64KB)
        L1d L#6 (16KB) + Core L#6 + PU L#6 (P#6)
        L1d L#7 (16KB) + Core L#7 + PU L#7 (P#7)
    NUMANode L#1 (P#1 32GB) + L3 L#1 (6144KB)
      L2 L#4 (2048KB) + L1i L#4 (64KB)
        L1d L#8 (16KB) + Core L#8 + PU L#8 (P#8)
        L1d L#9 (16KB) + Core L#9 + PU L#9 (P#9)
      L2 L#5 (2048KB) + L1i L#5 (64KB)
        L1d L#10 (16KB) + Core L#10 + PU L#10 (P#10)
        L1d L#11 (16KB) + Core L#11 + PU L#11 (P#11)
      L2 L#6 (2048KB) + L1i L#6 (64KB)
        L1d L#12 (16KB) + Core L#12 + PU L#12 (P#12)
        L1d L#13 (16KB) + Core L#13 + PU L#13 (P#13)
      L2 L#7 (2048KB) + L1i L#7 (64KB)
        L1d L#14 (16KB) + Core L#14 + PU L#14 (P#14)
        L1d L#15 (16KB) + Core L#15 + PU L#15 (P#15)
  Socket L#1 (64GB)
    NUMANode L#2 (P#2 32GB) + L3 L#2 (6144KB)
      L2 L#8 (2048KB) + L1i L#8 (64KB)
        L1d L#16 (16KB) + Core L#16 + PU L#16 (P#16)
        L1d L#17 (16KB) + Core L#17 + PU L#17 (P#17)
      L2 L#9 (2048KB) + L1i L#9 (64KB)
        L1d L#18 (16KB) + Core L#18 + PU L#18 (P#18)
        L1d L#19 (16KB) + Core L#19 + PU L#19 (P#19)
      L2 L#10 (2048KB) + L1i L#10 (64KB)
        L1d L#20 (16KB) + Core L#20 + PU L#20 (P#20)
        L1d L#21 (16KB) + Core L#21 + PU L#21 (P#21)
      L2 L#11 (2048KB) + L1i L#11 (64KB)
        L1d L#22 (16KB) + Core L#22 + PU L#22 (P#22)
        L1d L#23 (16KB) + Core L#23 + PU L#23 (P#23)
    NUMANode L#3 (P#3 32GB) + L3 L#3 (6144KB)
      L2 L#12 (2048KB) + L1i L#12 (64KB)
        L1d L#24 (16KB) + Core L#24 + PU L#24 (P#24)
        L1d L#25 (16KB) + Core L#25 + PU L#25 (P#25)
      L2 L#13 (2048KB) + L1i L#13 (64KB)
        L1d L#26 (16KB) + Core L#26 + PU L#26 (P#26)
        L1d L#27 (16KB) + Core L#27 + PU L#27 (P#27)
      L2 L#14 (2048KB) + L1i L#14 (64KB)
        L1d L#28 (16KB) + Core L#28 + PU L#28 (P#28)
        L1d L#29 (16KB) + Core L#29 + PU L#29 (P#29)
      L2 L#15 (2048KB) + L1i L#15 (64KB)
        L1d L#30 (16KB) + Core L#30 + PU L#30 (P#30)
        L1d L#31 (16KB) + Core L#31 + PU L#31 (P#31)
  Socket L#2 (64GB)
    NUMANode L#4 (P#4 32GB) + L3 L#4 (6144KB)
      L2 L#16 (2048KB) + L1i L#16 (64KB)
        L1d L#32 (16KB) + Core L#32 + PU L#32 (P#32)
        L1d L#33 (16KB) + Core L#33 + PU L#33 (P#33)
      L2 L#17 (2048KB) + L1i L#17 (64KB)
        L1d L#34 (16KB) + Core L#34 + PU L#34 (P#34)
        L1d L#35 (16KB) + Core L#35 + PU L#35 (P#35)
      L2 L#18 (2048KB) + L1i L#18 (64KB)
        L1d L#36 (16KB) + Core L#36 + PU L#36 (P#36)
        L1d L#37 (16KB) + Core L#37 + PU L#37 (P#37)
      L2 L#19 (2048KB) + L1i L#19 (64KB)
        L1d L#38 (16KB) + Core L#38 + PU L#38 (P#38)
        L1d L#39 (16KB) + Core L#39 + PU L#39 (P#39)
    NUMANode L#5 (P#5 32GB) + L3 L#5 (6144KB)
      L2 L#20 (2048KB) + L1i L#20 (64KB)
        L1d L#40 (16KB) + Core L#40 + PU L#40 (P#40)
        L1d L#41 (16KB) + Core L#41 + PU L#41 (P#41)
      L2 L#21 (2048KB) + L1i L#21 (64KB)
        L1d L#42 (16KB) + Core L#42 + PU L#42 (P#42)
        L1d L#43 (16KB) + Core L#43 + PU L#43 (P#43)
      L2 L#22 (2048KB) + L1i L#22 (64KB)
        L1d L#44 (16KB) + Core L#44 + PU L#44 (P#44)
        L1d L#45 (16KB) + Core L#45 + PU L#45 (P#45)
      L2 L#23 (2048KB) + L1i L#23 (64KB)
        L1d L#46 (16KB) + Core L#46 + PU L#46 (P#46)
        L1d L#47 (16KB) + Core L#47 + PU L#47 (P#47)
  Socket L#3 (64GB)
    NUMANode L#6 (P#6 32GB) + L3 L#6 (6144KB)
      L2 L#24 (2048KB) + L1i L#24 (64KB)
        L1d L#48 (16KB) + Core L#48 + PU L#48 (P#48)
        L1d L#49 (16KB) + Core L#49 + PU L#49 (P#49)
      L2 L#25 (2048KB) + L1i L#25 (64KB)
        L1d L#50 (16KB) + Core L#50 + PU L#50 (P#50)
        L1d L#51 (16KB) + Core L#51 + PU L#51 (P#51)
      L2 L#26 (2048KB) + L1i L#26 (64KB)
        L1d L#52 (16KB) + Core L#52 + PU L#52 (P#52)
        L1d L#53 (16KB) + Core L#53 + PU L#53 (P#53)
      L2 L#27 (2048KB) + L1i L#27 (64KB)
        L1d L#54 (16KB) + Core L#54 + PU L#54 (P#54)
        L1d L#55 (16KB) + Core L#55 + PU L#55 (P#55)
    NUMANode L#7 (P#7 32GB) + L3 L#7 (6144KB)
      L2 L#28 (2048KB) + L1i L#28 (64KB)
        L1d L#56 (16KB) + Core L#56 + PU L#56 (P#56)
        L1d L#57 (16KB) + Core L#57 + PU L#57 (P#57)
      L2 L#29 (2048KB) + L1i L#29 (64KB)
        L1d L#58 (16KB) + Core L#58 + PU L#58 (P#58)
        L1d L#59 (16KB) + Core L#59 + PU L#59 (P#59)
      L2 L#30 (2048KB) + L1i L#30 (64KB)
        L1d L#60 (16KB) + Core L#60 + PU L#60 (P#60)
        L1d L#61 (16KB) + Core L#61 + PU L#61 (P#61)
      L2 L#31 (2048KB) + L1i L#31 (64KB)
        L1d L#62 (16KB) + Core L#62 + PU L#62 (P#62)
        L1d L#63 (16KB) + Core L#63 + PU L#63 (P#63)
  HostBridge L#0
    PCIBridge
      PCI 15b3:1003
    PCIBridge
      PCI 8086:10e7
        Net L#0 "eth0"
      PCI 8086:10e7
        Net L#1 "eth1"
    PCI 1002:4390
      Block L#2 "sda"
      Block L#3 "sdb"
    PCI 1002:439c
    PCIBridge
      PCI 102b:0532

Reply via email to