--- Begin Message ---
Package: src:linux
Version: 3.16.7-ckt2-1
Severity: normal
Dear Maintainer,
On a machine with 2 Intel Haswell Xeon E5-2697 v3 CPUs, we are
observing
a regression in how topology is detected. Using Wheezy, Linux detects 2
sockets
and output the following text:
====><===============
Jan 6 15:15:11 pocn001 kernel: [ 0.450629] Booting Node 0,
Processors #1
Jan 6 15:15:11 pocn001 kernel: [ 0.455199] smpboot cpu 1: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 0.567069] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 0.573406] #2
Jan 6 15:15:11 pocn001 kernel: [ 0.575160] smpboot cpu 2: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 0.686818] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 0.693158] #3
Jan 6 15:15:11 pocn001 kernel: [ 0.694911] smpboot cpu 3: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 0.806473] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 0.812809] #4
Jan 6 15:15:11 pocn001 kernel: [ 0.814562] smpboot cpu 4: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 0.926220] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 0.932548] #5
Jan 6 15:15:11 pocn001 kernel: [ 0.934302] smpboot cpu 5: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 1.045959] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 1.052293] #6
Jan 6 15:15:11 pocn001 kernel: [ 1.054047] smpboot cpu 6: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 1.165709] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 1.172099] Ok.
Jan 6 15:15:11 pocn001 kernel: [ 1.174143] Booting Node 1,
Processors #7
Jan 6 15:15:11 pocn001 kernel: [ 1.178712] smpboot cpu 7: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 1.289472] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 1.295830] #8
Jan 6 15:15:11 pocn001 kernel: [ 1.297584] smpboot cpu 8: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 1.409242] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 1.415599] #9
Jan 6 15:15:11 pocn001 kernel: [ 1.417354] smpboot cpu 9: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 1.529010] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 1.535350] #10
Jan 6 15:15:11 pocn001 kernel: [ 1.537201] smpboot cpu 10: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 1.648655] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 1.654984] #11
Jan 6 15:15:11 pocn001 kernel: [ 1.656835] smpboot cpu 11: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 1.768484] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 1.774815] #12
Jan 6 15:15:11 pocn001 kernel: [ 1.776667] smpboot cpu 12: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 1.888219] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 1.894552] #13
Jan 6 15:15:11 pocn001 kernel: [ 1.896403] smpboot cpu 13: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 2.008055] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 2.014445] Ok.
Jan 6 15:15:11 pocn001 kernel: [ 2.016491] Booting Node 2,
Processors #14
Jan 6 15:15:11 pocn001 kernel: [ 2.021156] smpboot cpu 14: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 2.131722] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 2.138096] #15
Jan 6 15:15:11 pocn001 kernel: [ 2.139948] smpboot cpu 15: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 2.251343] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 2.257713] #16
Jan 6 15:15:11 pocn001 kernel: [ 2.259564] smpboot cpu 16: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 2.371119] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 2.377469] #17
Jan 6 15:15:11 pocn001 kernel: [ 2.379320] smpboot cpu 17: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 2.490874] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 2.497218] #18
Jan 6 15:15:11 pocn001 kernel: [ 2.499070] smpboot cpu 18: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 2.610525] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 2.616866] #19
Jan 6 15:15:11 pocn001 kernel: [ 2.618717] smpboot cpu 19: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 2.730272] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 2.736616] #20
Jan 6 15:15:11 pocn001 kernel: [ 2.738468] smpboot cpu 20: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 2.850025] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 2.856412] Ok.
Jan 6 15:15:11 pocn001 kernel: [ 2.858455] Booting Node 3,
Processors #21
Jan 6 15:15:11 pocn001 kernel: [ 2.863122] smpboot cpu 21: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 2.973884] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 2.980261] #22
Jan 6 15:15:11 pocn001 kernel: [ 2.982113] smpboot cpu 22: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 3.093568] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 3.099939] #23
Jan 6 15:15:11 pocn001 kernel: [ 3.101791] smpboot cpu 23: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 3.213261] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 3.219631] #24
Jan 6 15:15:11 pocn001 kernel: [ 3.221483] smpboot cpu 24: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 3.332984] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 3.339329] #25
Jan 6 15:15:11 pocn001 kernel: [ 3.341181] smpboot cpu 25: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 3.452836] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 3.459188] #26
Jan 6 15:15:11 pocn001 kernel: [ 3.461040] smpboot cpu 26: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 3.572499] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 3.578847] #27 Ok.
Jan 6 15:15:11 pocn001 kernel: [ 3.581277] smpboot cpu 27: start_ip
= 89000
Jan 6 15:15:11 pocn001 kernel: [ 3.692337] NMI watchdog enabled,
takes one hw-pmu counter.
Jan 6 15:15:11 pocn001 kernel: [ 3.698561] Brought up 28 CPUs
Jan 6 15:15:11 pocn001 kernel: [ 3.701962] Total of 28 processors
activated (145597.28 BogoMIPS).
====><===============
lscpu gives:
====><===============
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 28
On-line CPU(s) list: 0-27
Thread(s) per core: 1
Core(s) per socket: 14
Socket(s): 2
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Stepping: 2
CPU MHz: 2601.000
BogoMIPS: 5199.94
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 17920K
NUMA node0 CPU(s): 0-6
NUMA node1 CPU(s): 7-13
NUMA node2 CPU(s): 14-20
NUMA node3 CPU(s): 21-27
====><===============
Booting the same machine, or one with the exact same hardware, using
Jessie's kernel
leads to a different result:
====><===============
Jan 6 13:58:55 pocn501 kernel: [ 0.444912] x86: Booting SMP
configuration:
Jan 6 13:58:55 pocn501 kernel: [ 0.449579] .... node #0, CPUs:
#1
Jan 6 13:58:55 pocn501 kernel: [ 0.468345] NMI watchdog: enabled on
all CPUs, permanently consumes one hw-PMU counter.
Jan 6 13:58:55 pocn501 kernel: [ 0.477630] #2 #3 #4 #5 #6
Jan 6 13:58:55 pocn501 kernel: [ 0.551311] .... node #1, CPUs:
#7
Jan 6 13:58:55 pocn501 kernel: [ 0.567061] ------------[ cut here
]------------
Jan 6 13:58:55 pocn501 kernel: [ 0.572421] WARNING: CPU: 7 PID: 0
at /build/linux-CMiYW9/linux-3.16.7-ckt2/arch/x86/kernel/smpboot.c:310
topology_sane.isra.2+0x7b/0x90()
Jan 6 13:58:55 pocn501 kernel: [ 0.586304] sched: CPU #7's
mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring
dependency.
Jan 6 13:58:55 pocn501 kernel: [ 0.597176] Modules linked in:
Jan 6 13:58:55 pocn501 kernel: [ 0.600591] CPU: 7 PID: 0 Comm:
swapper/7 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt2-1
Jan 6 13:58:55 pocn501 kernel: [ 0.610011] Hardware name: IBM IBM
NeXtScale nx360 M5 -[5465FT1]-/00KG122, BIOS -[THE104FUS-1.03]-
11/26/2014
Jan 6 13:58:55 pocn501 kernel: [ 0.621079] 0000000000000009
ffffffff81507263 ffff88046f9f7e58 ffffffff81065847
Jan 6 13:58:55 pocn501 kernel: [ 0.629367] 0000000000000001
ffff88046f9f7ea8 ffff88087fc12980 0000000000012980
Jan 6 13:58:55 pocn501 kernel: [ 0.637657] 000000000000a060
ffffffff810658ac ffffffff8170f760 ffff880400000030
Jan 6 13:58:55 pocn501 kernel: [ 0.645948] Call Trace:
Jan 6 13:58:55 pocn501 kernel: [ 0.648678] [<ffffffff81507263>] ?
dump_stack+0x41/0x51
Jan 6 13:58:55 pocn501 kernel: [ 0.654607] [<ffffffff81065847>] ?
warn_slowpath_common+0x77/0x90
Jan 6 13:58:55 pocn501 kernel: [ 0.661505] [<ffffffff810658ac>] ?
warn_slowpath_fmt+0x4c/0x50
Jan 6 13:58:55 pocn501 kernel: [ 0.668112] [<ffffffff810027ae>] ?
calibrate_delay+0xbe/0x910
Jan 6 13:58:55 pocn501 kernel: [ 0.674622] [<ffffffff8104236b>] ?
topology_sane.isra.2+0x7b/0x90
Jan 6 13:58:55 pocn501 kernel: [ 0.681519] [<ffffffff81042844>] ?
set_cpu_sibling_map+0x484/0x500
Jan 6 13:58:55 pocn501 kernel: [ 0.688515] [<ffffffff81042a04>] ?
start_secondary+0x144/0x2d0
Jan 6 13:58:55 pocn501 kernel: [ 0.695123] ---[ end trace
7f2af1a99481016b ]---
Jan 6 13:58:55 pocn501 kernel: [ 0.720515] #8 #9 #10 #11 #12 #13
Jan 6 13:58:55 pocn501 kernel: [ 0.808491] .... node #2, CPUs:
#14 #15 #16 #17 #18 #19 #20
Jan 6 13:58:55 pocn501 kernel: [ 1.011650] .... node #3, CPUs:
#21 #22 #23 #24 #25 #26 #27
Jan 6 13:58:55 pocn501 kernel: [ 1.135087] x86: Booted up 4 nodes,
28 CPUs
Jan 6 13:58:55 pocn501 kernel: [ 1.139961] smpboot: Total of 28
processors activated (145614.25 BogoMIPS)
====><===============
and lscpu gives:
====><===============
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 28
On-line CPU(s) list: 0-27
Thread(s) per core: 1
Core(s) per socket: 7
Socket(s): 4
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
Stepping: 2
CPU MHz: 1272.679
CPU max MHz: 3600,0000
CPU min MHz: 1200,0000
BogoMIPS: 5201.29
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 17920K
NUMA node0 CPU(s): 0-6
NUMA node1 CPU(s): 7-13
NUMA node2 CPU(s): 14-20
NUMA node3 CPU(s): 21-27
====><===============
I attach relevant log file and bug script output for your convenience.
Please
let me know if you need more details.
Looking at recent changes in Linux 3.18, it might be resolved using:
- cebf15eb09a2fd2fa73ee4faa9c4d2f813cf0f09
- 728e5653e6fdb2a0892e94a600aef8c9a036c7eb
(We intend to test this during the week).
Regards
--
Mehdi
kern_log_3.2.0-4-amd64.gz
Description: Binary data
kern_log_3.16.0-4-amd64.gz
Description: Binary data
reportbug-linux-image-3.2.0-4-amd64.gz
Description: Binary data
reportbug-linux-image-3.16.0-4-amd64.gz
Description: Binary data
--- End Message ---