Wait, I'm sorry, I must be missing something, please bear with me!
By the way, your discussion of groups 1 and 2 below is wrong. Group 2 doesn't
say that NUMA node == socket, and it doesn't report 8 sockets of 8 cores each.
It reports 4 sockets containing 2 NUMA nodes each containing 8 cores each, and
that's likely what you have here (AMD Opteron 6300 or 6200 processors?).
Output of lstopo from nodes of both BIOS versions seem to indicate that there
are 4 sockets, but slurm is reporting on numa nodes, no? If not, which version
of the BIOS is correct?
SocketsPerBoard=4:8(hw) CoresPerSocket=16:8(hw)
>>>This message indicates that slurm believes the hardware actually has 8
>>>sockets and 8 cores per socket no?
>>>
Complete lstopo info attached for clarity for group 1 and 2.
If there is a problem with the BIOS I'd like to correct it so please let me
know if the BIOS is actually at fault here.
Thanks!
Craig
On Wednesday, May 28, 2014 4:01 PM, Brice Goglin <brice.gog...@inria.fr> wrote:
Le 28/05/2014 14:57, Craig Kapfer a écrit :
>
>
>Hmm ... the slurm config defines that all nodes have 4
sockets with 16 cores per socket (which corresponds to
the hardware--all nodes are the same). Slurm node
config is as follows:
>
>
>NodeName=n[001-008] RealMemory=258452 Sockets=4 CoresPerSocket=16
>ThreadsPerCore=1 State=UNKNOWN Port=[17001-17008]
>
>
>But we get this error--so I suspect it's a parsing error on the slurm side?
No, it's slurm properly reading info from hwloc, but that info
doesn't match the actual hardware because the BIOS is buggy.
Brice
Machine (128GB)
NUMANode L#0 (P#0 32GB) + Socket L#0
L3 L#0 (6144KB)
L2 L#0 (2048KB) + L1i L#0 (64KB)
L1d L#0 (16KB) + Core L#0 + PU L#0 (P#0)
L1d L#1 (16KB) + Core L#1 + PU L#1 (P#1)
L2 L#1 (2048KB) + L1i L#1 (64KB)
L1d L#2 (16KB) + Core L#2 + PU L#2 (P#2)
L1d L#3 (16KB) + Core L#3 + PU L#3 (P#3)
L2 L#2 (2048KB) + L1i L#2 (64KB)
L1d L#4 (16KB) + Core L#4 + PU L#4 (P#4)
L1d L#5 (16KB) + Core L#5 + PU L#5 (P#5)
L2 L#3 (2048KB) + L1i L#3 (64KB)
L1d L#6 (16KB) + Core L#6 + PU L#6 (P#6)
L1d L#7 (16KB) + Core L#7 + PU L#7 (P#7)
L3 L#1 (6144KB)
L2 L#4 (2048KB) + L1i L#4 (64KB)
L1d L#8 (16KB) + Core L#8 + PU L#8 (P#8)
L1d L#9 (16KB) + Core L#9 + PU L#9 (P#9)
L2 L#5 (2048KB) + L1i L#5 (64KB)
L1d L#10 (16KB) + Core L#10 + PU L#10 (P#10)
L1d L#11 (16KB) + Core L#11 + PU L#11 (P#11)
L2 L#6 (2048KB) + L1i L#6 (64KB)
L1d L#12 (16KB) + Core L#12 + PU L#12 (P#12)
L1d L#13 (16KB) + Core L#13 + PU L#13 (P#13)
L2 L#7 (2048KB) + L1i L#7 (64KB)
L1d L#14 (16KB) + Core L#14 + PU L#14 (P#14)
L1d L#15 (16KB) + Core L#15 + PU L#15 (P#15)
NUMANode L#1 (P#2 32GB) + Socket L#1
L3 L#2 (6144KB)
L2 L#8 (2048KB) + L1i L#8 (64KB)
L1d L#16 (16KB) + Core L#16 + PU L#16 (P#16)
L1d L#17 (16KB) + Core L#17 + PU L#17 (P#17)
L2 L#9 (2048KB) + L1i L#9 (64KB)
L1d L#18 (16KB) + Core L#18 + PU L#18 (P#18)
L1d L#19 (16KB) + Core L#19 + PU L#19 (P#19)
L2 L#10 (2048KB) + L1i L#10 (64KB)
L1d L#20 (16KB) + Core L#20 + PU L#20 (P#20)
L1d L#21 (16KB) + Core L#21 + PU L#21 (P#21)
L2 L#11 (2048KB) + L1i L#11 (64KB)
L1d L#22 (16KB) + Core L#22 + PU L#22 (P#22)
L1d L#23 (16KB) + Core L#23 + PU L#23 (P#23)
L3 L#3 (6144KB)
L2 L#12 (2048KB) + L1i L#12 (64KB)
L1d L#24 (16KB) + Core L#24 + PU L#24 (P#24)
L1d L#25 (16KB) + Core L#25 + PU L#25 (P#25)
L2 L#13 (2048KB) + L1i L#13 (64KB)
L1d L#26 (16KB) + Core L#26 + PU L#26 (P#26)
L1d L#27 (16KB) + Core L#27 + PU L#27 (P#27)
L2 L#14 (2048KB) + L1i L#14 (64KB)
L1d L#28 (16KB) + Core L#28 + PU L#28 (P#28)
L1d L#29 (16KB) + Core L#29 + PU L#29 (P#29)
L2 L#15 (2048KB) + L1i L#15 (64KB)
L1d L#30 (16KB) + Core L#30 + PU L#30 (P#30)
L1d L#31 (16KB) + Core L#31 + PU L#31 (P#31)
NUMANode L#2 (P#4 32GB) + Socket L#2
L3 L#4 (6144KB)
L2 L#16 (2048KB) + L1i L#16 (64KB)
L1d L#32 (16KB) + Core L#32 + PU L#32 (P#32)
L1d L#33 (16KB) + Core L#33 + PU L#33 (P#33)
L2 L#17 (2048KB) + L1i L#17 (64KB)
L1d L#34 (16KB) + Core L#34 + PU L#34 (P#34)
L1d L#35 (16KB) + Core L#35 + PU L#35 (P#35)
L2 L#18 (2048KB) + L1i L#18 (64KB)
L1d L#36 (16KB) + Core L#36 + PU L#36 (P#36)
L1d L#37 (16KB) + Core L#37 + PU L#37 (P#37)
L2 L#19 (2048KB) + L1i L#19 (64KB)
L1d L#38 (16KB) + Core L#38 + PU L#38 (P#38)
L1d L#39 (16KB) + Core L#39 + PU L#39 (P#39)
L3 L#5 (6144KB)
L2 L#20 (2048KB) + L1i L#20 (64KB)
L1d L#40 (16KB) + Core L#40 + PU L#40 (P#40)
L1d L#41 (16KB) + Core L#41 + PU L#41 (P#41)
L2 L#21 (2048KB) + L1i L#21 (64KB)
L1d L#42 (16KB) + Core L#42 + PU L#42 (P#42)
L1d L#43 (16KB) + Core L#43 + PU L#43 (P#43)
L2 L#22 (2048KB) + L1i L#22 (64KB)
L1d L#44 (16KB) + Core L#44 + PU L#44 (P#44)
L1d L#45 (16KB) + Core L#45 + PU L#45 (P#45)
L2 L#23 (2048KB) + L1i L#23 (64KB)
L1d L#46 (16KB) + Core L#46 + PU L#46 (P#46)
L1d L#47 (16KB) + Core L#47 + PU L#47 (P#47)
NUMANode L#3 (P#6 32GB) + Socket L#3
L3 L#6 (6144KB)
L2 L#24 (2048KB) + L1i L#24 (64KB)
L1d L#48 (16KB) + Core L#48 + PU L#48 (P#48)
L1d L#49 (16KB) + Core L#49 + PU L#49 (P#49)
L2 L#25 (2048KB) + L1i L#25 (64KB)
L1d L#50 (16KB) + Core L#50 + PU L#50 (P#50)
L1d L#51 (16KB) + Core L#51 + PU L#51 (P#51)
L2 L#26 (2048KB) + L1i L#26 (64KB)
L1d L#52 (16KB) + Core L#52 + PU L#52 (P#52)
L1d L#53 (16KB) + Core L#53 + PU L#53 (P#53)
L2 L#27 (2048KB) + L1i L#27 (64KB)
L1d L#54 (16KB) + Core L#54 + PU L#54 (P#54)
L1d L#55 (16KB) + Core L#55 + PU L#55 (P#55)
L3 L#7 (6144KB)
L2 L#28 (2048KB) + L1i L#28 (64KB)
L1d L#56 (16KB) + Core L#56 + PU L#56 (P#56)
L1d L#57 (16KB) + Core L#57 + PU L#57 (P#57)
L2 L#29 (2048KB) + L1i L#29 (64KB)
L1d L#58 (16KB) + Core L#58 + PU L#58 (P#58)
L1d L#59 (16KB) + Core L#59 + PU L#59 (P#59)
L2 L#30 (2048KB) + L1i L#30 (64KB)
L1d L#60 (16KB) + Core L#60 + PU L#60 (P#60)
L1d L#61 (16KB) + Core L#61 + PU L#61 (P#61)
L2 L#31 (2048KB) + L1i L#31 (64KB)
L1d L#62 (16KB) + Core L#62 + PU L#62 (P#62)
L1d L#63 (16KB) + Core L#63 + PU L#63 (P#63)
HostBridge L#0
PCIBridge
PCI 15b3:1003
PCIBridge
PCI 8086:10e7
Net L#0 "eth0"
PCI 8086:10e7
Net L#1 "eth1"
PCI 1002:4390
Block L#2 "sda"
Block L#3 "sdb"
PCI 1002:439c
PCIBridge
PCI 102b:0532
Machine (256GB)
Socket L#0 (64GB)
NUMANode L#0 (P#0 32GB) + L3 L#0 (6144KB)
L2 L#0 (2048KB) + L1i L#0 (64KB)
L1d L#0 (16KB) + Core L#0 + PU L#0 (P#0)
L1d L#1 (16KB) + Core L#1 + PU L#1 (P#1)
L2 L#1 (2048KB) + L1i L#1 (64KB)
L1d L#2 (16KB) + Core L#2 + PU L#2 (P#2)
L1d L#3 (16KB) + Core L#3 + PU L#3 (P#3)
L2 L#2 (2048KB) + L1i L#2 (64KB)
L1d L#4 (16KB) + Core L#4 + PU L#4 (P#4)
L1d L#5 (16KB) + Core L#5 + PU L#5 (P#5)
L2 L#3 (2048KB) + L1i L#3 (64KB)
L1d L#6 (16KB) + Core L#6 + PU L#6 (P#6)
L1d L#7 (16KB) + Core L#7 + PU L#7 (P#7)
NUMANode L#1 (P#1 32GB) + L3 L#1 (6144KB)
L2 L#4 (2048KB) + L1i L#4 (64KB)
L1d L#8 (16KB) + Core L#8 + PU L#8 (P#8)
L1d L#9 (16KB) + Core L#9 + PU L#9 (P#9)
L2 L#5 (2048KB) + L1i L#5 (64KB)
L1d L#10 (16KB) + Core L#10 + PU L#10 (P#10)
L1d L#11 (16KB) + Core L#11 + PU L#11 (P#11)
L2 L#6 (2048KB) + L1i L#6 (64KB)
L1d L#12 (16KB) + Core L#12 + PU L#12 (P#12)
L1d L#13 (16KB) + Core L#13 + PU L#13 (P#13)
L2 L#7 (2048KB) + L1i L#7 (64KB)
L1d L#14 (16KB) + Core L#14 + PU L#14 (P#14)
L1d L#15 (16KB) + Core L#15 + PU L#15 (P#15)
Socket L#1 (64GB)
NUMANode L#2 (P#2 32GB) + L3 L#2 (6144KB)
L2 L#8 (2048KB) + L1i L#8 (64KB)
L1d L#16 (16KB) + Core L#16 + PU L#16 (P#16)
L1d L#17 (16KB) + Core L#17 + PU L#17 (P#17)
L2 L#9 (2048KB) + L1i L#9 (64KB)
L1d L#18 (16KB) + Core L#18 + PU L#18 (P#18)
L1d L#19 (16KB) + Core L#19 + PU L#19 (P#19)
L2 L#10 (2048KB) + L1i L#10 (64KB)
L1d L#20 (16KB) + Core L#20 + PU L#20 (P#20)
L1d L#21 (16KB) + Core L#21 + PU L#21 (P#21)
L2 L#11 (2048KB) + L1i L#11 (64KB)
L1d L#22 (16KB) + Core L#22 + PU L#22 (P#22)
L1d L#23 (16KB) + Core L#23 + PU L#23 (P#23)
NUMANode L#3 (P#3 32GB) + L3 L#3 (6144KB)
L2 L#12 (2048KB) + L1i L#12 (64KB)
L1d L#24 (16KB) + Core L#24 + PU L#24 (P#24)
L1d L#25 (16KB) + Core L#25 + PU L#25 (P#25)
L2 L#13 (2048KB) + L1i L#13 (64KB)
L1d L#26 (16KB) + Core L#26 + PU L#26 (P#26)
L1d L#27 (16KB) + Core L#27 + PU L#27 (P#27)
L2 L#14 (2048KB) + L1i L#14 (64KB)
L1d L#28 (16KB) + Core L#28 + PU L#28 (P#28)
L1d L#29 (16KB) + Core L#29 + PU L#29 (P#29)
L2 L#15 (2048KB) + L1i L#15 (64KB)
L1d L#30 (16KB) + Core L#30 + PU L#30 (P#30)
L1d L#31 (16KB) + Core L#31 + PU L#31 (P#31)
Socket L#2 (64GB)
NUMANode L#4 (P#4 32GB) + L3 L#4 (6144KB)
L2 L#16 (2048KB) + L1i L#16 (64KB)
L1d L#32 (16KB) + Core L#32 + PU L#32 (P#32)
L1d L#33 (16KB) + Core L#33 + PU L#33 (P#33)
L2 L#17 (2048KB) + L1i L#17 (64KB)
L1d L#34 (16KB) + Core L#34 + PU L#34 (P#34)
L1d L#35 (16KB) + Core L#35 + PU L#35 (P#35)
L2 L#18 (2048KB) + L1i L#18 (64KB)
L1d L#36 (16KB) + Core L#36 + PU L#36 (P#36)
L1d L#37 (16KB) + Core L#37 + PU L#37 (P#37)
L2 L#19 (2048KB) + L1i L#19 (64KB)
L1d L#38 (16KB) + Core L#38 + PU L#38 (P#38)
L1d L#39 (16KB) + Core L#39 + PU L#39 (P#39)
NUMANode L#5 (P#5 32GB) + L3 L#5 (6144KB)
L2 L#20 (2048KB) + L1i L#20 (64KB)
L1d L#40 (16KB) + Core L#40 + PU L#40 (P#40)
L1d L#41 (16KB) + Core L#41 + PU L#41 (P#41)
L2 L#21 (2048KB) + L1i L#21 (64KB)
L1d L#42 (16KB) + Core L#42 + PU L#42 (P#42)
L1d L#43 (16KB) + Core L#43 + PU L#43 (P#43)
L2 L#22 (2048KB) + L1i L#22 (64KB)
L1d L#44 (16KB) + Core L#44 + PU L#44 (P#44)
L1d L#45 (16KB) + Core L#45 + PU L#45 (P#45)
L2 L#23 (2048KB) + L1i L#23 (64KB)
L1d L#46 (16KB) + Core L#46 + PU L#46 (P#46)
L1d L#47 (16KB) + Core L#47 + PU L#47 (P#47)
Socket L#3 (64GB)
NUMANode L#6 (P#6 32GB) + L3 L#6 (6144KB)
L2 L#24 (2048KB) + L1i L#24 (64KB)
L1d L#48 (16KB) + Core L#48 + PU L#48 (P#48)
L1d L#49 (16KB) + Core L#49 + PU L#49 (P#49)
L2 L#25 (2048KB) + L1i L#25 (64KB)
L1d L#50 (16KB) + Core L#50 + PU L#50 (P#50)
L1d L#51 (16KB) + Core L#51 + PU L#51 (P#51)
L2 L#26 (2048KB) + L1i L#26 (64KB)
L1d L#52 (16KB) + Core L#52 + PU L#52 (P#52)
L1d L#53 (16KB) + Core L#53 + PU L#53 (P#53)
L2 L#27 (2048KB) + L1i L#27 (64KB)
L1d L#54 (16KB) + Core L#54 + PU L#54 (P#54)
L1d L#55 (16KB) + Core L#55 + PU L#55 (P#55)
NUMANode L#7 (P#7 32GB) + L3 L#7 (6144KB)
L2 L#28 (2048KB) + L1i L#28 (64KB)
L1d L#56 (16KB) + Core L#56 + PU L#56 (P#56)
L1d L#57 (16KB) + Core L#57 + PU L#57 (P#57)
L2 L#29 (2048KB) + L1i L#29 (64KB)
L1d L#58 (16KB) + Core L#58 + PU L#58 (P#58)
L1d L#59 (16KB) + Core L#59 + PU L#59 (P#59)
L2 L#30 (2048KB) + L1i L#30 (64KB)
L1d L#60 (16KB) + Core L#60 + PU L#60 (P#60)
L1d L#61 (16KB) + Core L#61 + PU L#61 (P#61)
L2 L#31 (2048KB) + L1i L#31 (64KB)
L1d L#62 (16KB) + Core L#62 + PU L#62 (P#62)
L1d L#63 (16KB) + Core L#63 + PU L#63 (P#63)
HostBridge L#0
PCIBridge
PCI 15b3:1003
PCIBridge
PCI 8086:10e7
Net L#0 "eth0"
PCI 8086:10e7
Net L#1 "eth1"
PCI 1002:4390
Block L#2 "sda"
Block L#3 "sdb"
PCI 1002:439c
PCIBridge
PCI 102b:0532