I built and tested a 6.2.0-1004-nvidia based kernel with this patch
applied and did not see the warning message on boot. I'll follow up
further with Ian on Monday.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2026891

Title:
  linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at
  init/main.c:1065 start_kernel+0x4da/0x540"

Status in linux-nvidia-6.2 package in Ubuntu:
  New

Bug description:
  We started testing the jammy/linux-nvidia-6.2 kernels on the nvidia
  servers (DGX-1/DGX-2/H100) and hit the following warning during boot:

  [    7.690486] ------------[ cut here ]------------
  [    7.690487] Interrupts were enabled early
  [    7.690490] WARNING: CPU: 0 PID: 0 at init/main.c:1065 
start_kernel+0x4da/0x540
  [    7.690498] Modules linked in:
  [    7.690501] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia 
#4~22.04.1-Ubuntu
  [    7.690504] Hardware name: NVIDIA NVIDIA DGX-2/NVIDIA DGX-2, BIOS 0.29 
06/07/2021
  [    7.690505] RIP: 0010:start_kernel+0x4da/0x540
  [    7.690508] Code: ff 48 c7 c7 e8 26 f0 97 e8 b3 59 a8 fd 0f 0b e9 96 fd ff 
ff e8 a7 1d 04 00 e9 7c fe ff ff 48 c7 c7 18 27 f0 97 e8 96 59 a8 fd <0f> 0b e9 
ed fd ff ff 48 c7 c7 b0 26 f0 97 e8 83 59 a8 fd 0f 0b ff
  [    7.690510] RSP: 0000:ffffffff98803f08 EFLAGS: 00010246
  [    7.690512] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000000
  [    7.690513] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
  [    7.690514] RBP: ffffffff98803f20 R08: 0000000000000000 R09: 
0000000000000000
  [    7.690515] R10: 0000000000000000 R11: 0000000000000000 R12: 
00000000000000e0
  [    7.690516] R13: 000000005a1ccde0 R14: 000000005a1c7469 R15: 
000000005a1d7ee0
  [    7.690518] FS:  0000000000000000(0000) GS:ffff964900600000(0000) 
knlGS:0000000000000000
  [    7.690520] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [    7.690521] CR2: ffff970bfffff000 CR3: 000000ecd7810001 CR4: 
00000000000606f0
  [    7.690522] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
  [    7.690523] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
  [    7.690524] Call Trace:
  [    7.690526]  <TASK>
  [    7.690529]  x86_64_start_kernel+0x102/0x180
  [    7.690536]  secondary_startup_64_no_verify+0xe5/0xeb
  [    7.690544]  </TASK>
  [    7.690544] ---[ end trace 0000000000000000 ]---

  I also see pretty much the same thing on some Ampere based arm64
  servers:

  [    0.000519] ------------[ cut here ]------------
  [    0.000521] Interrupts were enabled early
  [    0.000525] WARNING: CPU: 0 PID: 0 at init/main.c:1065 
start_kernel+0x3ac/0x514
  [    0.000531] Modules linked in:
  [    0.000535] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia 
#4~22.04.1-Ubuntu
  [    0.000538] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [    0.000540] pc : start_kernel+0x3ac/0x514
  [    0.000543] lr : start_kernel+0x3ac/0x514
  [    0.000545] sp : ffffdec5ff733e60
  [    0.000546] x29: ffffdec5ff733e60 x28: 00000819aa09baac x27: 
0000403ffdd124e0
  [    0.000549] x26: 00000000bfdf3788 x25: 000000009b6fc000 x24: 
00000000001dba7b
  [    0.000552] x23: 00005ec57c980000 x22: 00000819ab2a0000 x21: 
ffffdec5ff749140
  [    0.000555] x20: ffffdec5ff73d9c0 x19: ffffdec5ffbe4000 x18: 
ffffdec5ff74a1c8
  [    0.000558] x17: 0000000000000000 x16: 0000000000000000 x15: 
0000000000000000
  [    0.000560] x14: 0000000000000000 x13: 0a796c7261652064 x12: 
656c62616e652065
  [    0.000563] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : 
0000000000000000
  [    0.000565] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 
0000000000000000
  [    0.000568] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 
0000000000000000
  [    0.000571] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 
0000000000000000
  [    0.000573] Call trace:
  [    0.000574]  start_kernel+0x3ac/0x514
  [    0.000577]  __primary_switched+0xc0/0xc8
  [    0.000580] ---[ end trace 0000000000000000 ]---

  The warning does not appear on an older thunderx2 server.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2026891/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to