This change was made by a bot.

** Changed in: linux (Ubuntu)
       Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1906716

Title:
  Stack trace booting 20.04 LTS server on system with dual Xeon Gold
  6240 CPUs

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I noticed this in syslog while investigating an unrelated issue today.
  I have Focal installed on a Fujitsu RX2530 M5 server with two Xeon
  Gold 6240 18c/36t CPUs installed.  Every reboot results in the
  following MSR stack trace:

  Dec  3 17:34:31 nabbit kernel: [    0.002463] smpboot: CPU 18 Converting 
physical 0 to logical die 1
  Dec  3 17:34:31 nabbit kernel: [    0.002463] unchecked MSR access error: 
WRMSR to 0x10f (tried to write 0x0000000000000000) at rIP: 0xffffffff81c78b04 
(native_write_msr+0x4/0x30)
  Dec  3 17:34:31 nabbit kernel: [    0.002463] Call Trace:
  Dec  3 17:34:31 nabbit kernel: [    0.002463]  ? 
intel_pmu_cpu_starting+0x87/0x270
  Dec  3 17:34:31 nabbit kernel: [    0.002463]  ? x86_pmu_dead_cpu+0x30/0x30
  Dec  3 17:34:31 nabbit kernel: [    0.002463]  x86_pmu_starting_cpu+0x1a/0x30
  Dec  3 17:34:31 nabbit kernel: [    0.002463]  
cpuhp_invoke_callback+0x9b/0x580
  Dec  3 17:34:31 nabbit kernel: [    0.002463]  notify_cpu_starting+0x66/0x80
  Dec  3 17:34:31 nabbit kernel: [    0.002463]  start_secondary+0xaa/0x1c0
  Dec  3 17:34:31 nabbit kernel: [    0.002463]  secondary_startup_64+0xa4/0xb0
  Dec  3 17:34:31 nabbit kernel: [    0.498575]  #19 #20 #21 #22 #23 #24 #25 
#26 #27 #28 #29 #30 #31 #32 #33 #34 #35
  Dec  3 17:34:31 nabbit kernel: [    0.618576] .... node  #0, CPUs:   #36
  Dec  3 17:34:31 nabbit kernel: [    0.623308] MDS CPU bug present and SMT on, 
data leak possible. See 
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more 
details.
  Dec  3 17:34:31 nabbit kernel: [    0.623308] TAA CPU bug present and SMT on, 
data leak possible. See 
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html 
for more details.
  Dec  3 17:34:31 nabbit kernel: [    0.623308]  #37 #38 #39 #40 #41 #42 #43 
#44 #45 #46 #47 #48 #49 #50 #51 #52 #53
  Dec  3 17:34:31 nabbit kernel: [    0.672450] .... node  #1, CPUs:   #54 #55 
#56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70 #71Dec  3 17:34:31 
nabbit kernel: [    0.729432] smp: Brought up 2 nodes, 72 CPUs
  Dec  3 17:34:31 nabbit kernel: [    0.729432] smpboot: Max logical packages: 2
  Dec  3 17:34:31 nabbit kernel: [    0.729432] smpboot: Total of 72 processors 
activated (374479.29 BogoMIPS)

  
  it doesn't seem to be catastrophic, but is troubling to find this in the logs.

  On a different FJ server (RX2540 M5) with 2x Xeon Gold 6242 cpus
  (16c/32T)

  This trace is not present, so this could indicate something with this
  particular machine, or this particular CPU model.

  Here is the smp boot from the non-failing machine:

  
  Dec  2 16:02:56 polari kernel: [    1.522346] smpboot: CPU0: Intel(R) Xeon(R) 
Gold 6242 CPU @ 2.80GHz (family: 0x6, model: 0x55, stepping: 0x5)
  Dec  2 16:02:56 polari kernel: [    1.522575] Performance Events: PEBS fmt3+, 
Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
  Dec  2 16:02:56 polari kernel: [    1.522584] ... version:                4
  Dec  2 16:02:56 polari kernel: [    1.522585] ... bit width:              48
  Dec  2 16:02:56 polari kernel: [    1.522587] ... generic registers:      4
  Dec  2 16:02:56 polari kernel: [    1.522588] ... value mask:             
0000ffffffffffff
  Dec  2 16:02:56 polari kernel: [    1.522589] ... max period:             
00007fffffffffff
  Dec  2 16:02:56 polari kernel: [    1.522591] ... fixed-purpose events:   3
  Dec  2 16:02:56 polari kernel: [    1.522592] ... event mask:             
000000070000000f
  Dec  2 16:02:56 polari kernel: [    1.522665] rcu: Hierarchical SRCU 
implementation.
  Dec  2 16:02:56 polari kernel: [    1.524965] NMI watchdog: Enabled. 
Permanently consumes one hw-PMU counter.
  Dec  2 16:02:56 polari kernel: [    1.525875] smp: Bringing up secondary CPUs 
...
  Dec  2 16:02:56 polari kernel: [    1.525990] x86: Booting SMP configuration:
  Dec  2 16:02:56 polari kernel: [    1.525992] .... node  #0, CPUs:        #1  
#2  #3
  Dec  2 16:02:56 polari kernel: [    1.533485] .... node  #1, CPUs:    #4  #5  
#6  #7
  Dec  2 16:02:56 polari kernel: [    1.543960] .... node  #0, CPUs:    #8  #9 
#10 #11
  Dec  2 16:02:56 polari kernel: [    1.553544] .... node  #1, CPUs:   #12 #13 
#14 #15
  Dec  2 16:02:56 polari kernel: [    1.564701] .... node  #2, CPUs:   #16
  Dec  2 16:02:56 polari kernel: [    0.002176] smpboot: CPU 16 Converting 
physical 0 to logical die 1
  Dec  2 16:02:56 polari kernel: [    1.651254]  #17 #18 #19
  Dec  2 16:02:56 polari kernel: [    1.659278] .... node  #3, CPUs:   #20 #21 
#22 #23
  Dec  2 16:02:56 polari kernel: [    1.669669] .... node  #2, CPUs:   #24 #25 
#26 #27
  Dec  2 16:02:56 polari kernel: [    1.680637] .... node  #3, CPUs:   #28 #29 
#30 #31
  Dec  2 16:02:56 polari kernel: [    1.691394] .... node  #0, CPUs:   #32
  Dec  2 16:02:56 polari kernel: [    1.693845] MDS CPU bug present and SMT on, 
data leak possible. See 
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more 
details.
  Dec  2 16:02:56 polari kernel: [    1.693845] TAA CPU bug present and SMT on, 
data leak possible. See 
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html 
for more details.
  Dec  2 16:02:56 polari kernel: [    1.693845]  #33 #34 #35
  Dec  2 16:02:56 polari kernel: [    1.701504] .... node  #1, CPUs:   #36 #37 
#38 #39
  Dec  2 16:02:56 polari kernel: [    1.712687] .... node  #0, CPUs:   #40 #41 
#42 #43
  Dec  2 16:02:56 polari kernel: [    1.723263] .... node  #1, CPUs:   #44 #45 
#46 #47
  Dec  2 16:02:56 polari kernel: [    1.733658] .... node  #2, CPUs:   #48 #49 
#50 #51
  Dec  2 16:02:56 polari kernel: [    1.744372] .... node  #3, CPUs:   #52 #53 
#54 #55
  Dec  2 16:02:56 polari kernel: [    1.755243] .... node  #2, CPUs:   #56 #57 
#58 #59
  Dec  2 16:02:56 polari kernel: [    1.765640] .... node  #3, CPUs:   #60 #61 
#62 #63
  Dec  2 16:02:56 polari kernel: [    1.776965] smp: Brought up 4 nodes, 64 CPUs
  Dec  2 16:02:56 polari kernel: [    1.776965] smpboot: Max logical packages: 2
  Dec  2 16:02:56 polari kernel: [    1.776965] smpboot: Total of 64 processors 
activated (358464.56 BogoMIPS)


  
  ProblemType: Bug
  DistroRelease: Ubuntu 20.04
  Package: linux-image-5.4.0-56-generic 5.4.0-56.62
  ProcVersionSignature: Ubuntu 5.4.0-56.62-generic 5.4.73
  Uname: Linux 5.4.0-56-generic x86_64
  NonfreeKernelModules: nvidia_modeset nvidia
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Dec  3 20:02 seq
   crw-rw---- 1 root audio 116, 33 Dec  3 20:02 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.10
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CasperMD5CheckResult: skip
  Date: Thu Dec  3 20:15:56 2020
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 0424:2533 Microchip Technology, Inc. (formerly SMSC)
   Bus 001 Device 004: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 003: ID 046b:ff01 American Megatrends, Inc. Virtual Hub
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: FUJITSU PRIMERGY RX2530 M5
  PciMultimedia:

  ProcEnviron:
   TERM=screen-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgag200drmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-56-generic 
root=UUID=0e82de6f-eac2-426d-b89e-e52b1acaa792 ro console=tty0
  RelatedPackageVersions:
   linux-restricted-modules-5.4.0-56-generic N/A
   linux-backports-modules-5.4.0-56-generic  N/A
   linux-firmware                            1.187.4
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 10/17/2019
  dmi.bios.vendor: FUJITSU // American Megatrends Inc.
  dmi.bios.version: V5.0.0.14 R1.15.0 for D3383-B1x
  dmi.board.name: D3383-B1
  dmi.board.vendor: FUJITSU
  dmi.board.version: S26361-D3383-B13 WGS04 GS01
  dmi.chassis.asset.tag: nabbit
  dmi.chassis.type: 23
  dmi.chassis.vendor: FUJITSU
  dmi.chassis.version: RX2530M5R3
  dmi.modalias: 
dmi:bvnFUJITSU//AmericanMegatrendsInc.:bvrV5.0.0.14R1.15.0forD3383-B1x:bd10/17/2019:svnFUJITSU:pnPRIMERGYRX2530M5:pvr:rvnFUJITSU:rnD3383-B1:rvrS26361-D3383-B13WGS04GS01:cvnFUJITSU:ct23:cvrRX2530M5R3:
  dmi.product.family: SERVER
  dmi.product.name: PRIMERGY RX2530 M5
  dmi.product.sku: S26361-K1659-Vxxx
  dmi.sys.vendor: FUJITSU

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1906716/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to