Bug#1044518: marked as done (linux: "RIP: 0010:get_xsave_addr+0x9b/0xb0" stacktrace in early boot with -24 bullseye kernel)

Debian Bug Tracking System Fri, 18 Aug 2023 13:33:38 -0700

Your message dated Fri, 18 Aug 2023 21:29:25 +0100
with message-id 
<2dadb28ca368809acbb9900196ab200e626ae565.ca...@adam-barratt.org.uk>
and subject line Re: Bug#1044518: linux: "RIP: 0010:get_xsave_addr+0x9b/0xb0" 
stacktrace in early boot with -24 bullseye kernel
has caused the Debian Bug report #1044518,
regarding linux: "RIP: 0010:get_xsave_addr+0x9b/0xb0" stacktrace in early boot 
with -24 bullseye kernel
to be marked as done.


This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
1044518: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1044518
Debian Bug Tracking System
Contact [email protected] with problems

--- Begin Message ---

Source: linux
Version: 5.10.179-5
User: [email protected]
Usertags: needed-by-DSA-Team
X-Debbugs-Cc: [email protected], [email protected]

Hi,

Since the kernels on both the host and guests were upgraded to
5.10.179-5 (from 5.10.179-3), the guests on one of our Ganeti clusters
have been reporting as tainted. Looking at dmesg shows the following
trace early in boot:

[    0.201347] RIP: 0010:get_xsave_addr+0x9b/0xb0
[    0.201351] Code: 48 83 c4 08 5b e9 15 80 bc 00 80 3d 8d 7c 80 01 00 75 a8 
48 c7 c7 97 de 6b b2 89 74 24 04 c6 05 79 7c 80 01 01 e8 f5 96 88 00 <0f> 0b 8b 
74 24 04 eb 89 31 c0 e9 e6 7f bc 00 66 0f 1f 44 00 00 89
[    0.201353] RSP: 0000:ffffffffb2c03ec8 EFLAGS: 00010282
[    0.201356] RAX: 0000000000000000 RBX: ffffffffb2e6a600 RCX: ffffffffb2cb3768
[    0.201358] RDX: c0000000ffffefff RSI: 00000000ffffefff RDI: 0000000000000247
[    0.201359] RBP: ffffffffb2e6a4a0 R08: 0000000000000000 R09: ffffffffb2c03ce8
[    0.201361] R10: ffffffffb2c03ce0 R11: ffffffffb2ccb7a8 R12: 0000000000000246
[    0.201362] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    0.201365] FS:  0000000000000000(0000) GS:ffff9588fbc00000(0000) 
knlGS:0000000000000000
[    0.201367] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.201368] CR2: ffff9588fffff000 CR3: 000000008260a001 CR4: 00000000007308b0
[    0.201373] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.201374] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.201376] Call Trace:
[    0.201383]  identify_cpu+0x51f/0x540
[    0.201389]  identify_boot_cpu+0xc/0x94
[    0.201392]  arch_cpu_finalize_init+0x5/0x47
[    0.201395]  start_kernel+0x4ec/0x599
[    0.201401]  secondary_startup_64_no_verify+0xb0/0xbb
[    0.201406] ---[ end trace d7d9074a88473cb2 ]---

The systems seem to be running OK, but the stacktrace presumably points
to an issue somewhere.

A sample kvm invocation for an affected guest is

ganeti04   18354 30.1  0.5 6015620 1114084 ?     Sl   Aug11 832:22 /usr/bin/kvm 
-name geo1.debian.org -m 1024 -smp 2 -pidfile 
/var/run/ganeti/kvm-hypervisor/pid/geo1.debian.org -device virtio-balloon 
-daemonize -D /var/log/ganeti/kvm/geo1.debian.org.log -machine pc-i440fx-5.2 
-monitor 
unix:/var/run/ganeti/kvm-hypervisor/ctrl/geo1.debian.org.monitor,server,nowait 
-serial 
unix:/var/run/ganeti/kvm-hypervisor/ctrl/geo1.debian.org.serial,server,nowait 
-usb -display none -cpu host -uuid 36cf5fbc-1414-4b27-874e-ea3153150aa9 -device 
virtio-rng-pci,bus=pci.0,addr=0x1e,max-bytes=1024,period=1000 -global 
isa-fdc.fdtypeA=none -netdev type=tap,id=nic-6e9afdf8-ccaf-42e8,fd=10 -device 
virtio-net-pci,id=nic-6e9afdf8-ccaf-42e8,bus=pci.0,addr=0xd,netdev=nic-6e9afdf8-ccaf-42e8,mac=aa:00:00:46:8f:08
 -incoming tcp:172.29.182.13:8102 -qmp 
unix:/var/run/ganeti/kvm-hypervisor/ctrl/geo1.debian.org.qmp,server,nowait -qmp 
unix:/var/run/ganeti/kvm-hypervisor/ctrl/geo1.debian.org.kvmd,server,nowait 
-boot c -device 
virtio-blk-pci,id=disk-8a45befd-be45-4b75,bus=pci.0,addr=0xc,drive=disk-8a45befd-be45-4b75
 -drive 
file=/var/run/ganeti/instance-disks/geo1.debian.org:0,format=raw,if=none,aio=threads,cache=none,discard=unmap,id=disk-8a45befd-be45-4b75,auto-read-only=off
 -runas ganeti04

It seems that buster guests on the same host are unaffected, with
similar-looking command lines.

The host's CPUs are Intel Xeon Silver 4110. Our other x86-64 clusters
either use AMD CPUs (also with "-cpu host") or Xeon E5-2699 v3 CPUs,
with "-cpu Haswell-noTSX".

Regards,

Adam

--- End Message ---

--- Begin Message ---

Version: 5.10.191-1

Hi,

On Tue, 2023-08-15 at 23:08 +0200, Salvatore Bonaccorso wrote:
> Hi Adam,
> 
> On Tue, Aug 15, 2023 at 10:48:35PM +0200, Salvatore Bonaccorso wrote:
> > Control: tags -1 + upstream
> > 
> > Hi Adam,
> > 
> > On Tue, Aug 15, 2023 at 10:06:16PM +0200, Salvatore Bonaccorso
> > wrote:
> > > Hi Adam,
> > > 
> > > On Tue, Aug 15, 2023 at 09:37:36PM +0200, Salvatore Bonaccorso
> > > wrote:
> > > > Control: tags -1 + confirmed
> > > > 
> > > > Hi Adam,
> > > > 
> > > > On Tue, Aug 15, 2023 at 06:26:59PM +0100, Adam D. Barratt
> > > > wrote:
> > > > > On Sun, 2023-08-13 at 18:21 +0100, Adam D. Barratt wrote:
> > > > > > Since the kernels on both the host and guests were upgraded
> > > > > > to
> > > > > > 5.10.179-5 (from 5.10.179-3), the guests on one of our
> > > > > > Ganeti
> > > > > > clusters
> > > > > > have been reporting as tainted. Looking at dmesg shows the
> > > > > > following
> > > > > > trace early in boot:
> > > > > > 
[...]
> > Quick summary: v5.10.190 upstream exhibit the same problem, so not
> > a
> > backporting problem, and v5.10.191-rc1 for the upcoming 5.10.191
> > seems
> > to fix the issue.
> 
> This should be fixed by b3607269ff57 ("x86/pkeys: Revert a5eff7259790
> ("x86/pkeys: Add PKRU value to init_fpstate")")[1] upstream, which is
> going to be a pplied in 5.10.191.
> 
>  [1] 
> https://git.kernel.org/linus/b3607269ff57fd3c9690cb25962c5e4b91a0fd3b
> 

I'm happy to confirm that the 5.10.191-1 kernel fixes this issue for
us; closing appropriately.

Regards,

Adam

--- End Message ---

Bug#1044518: marked as done (linux: "RIP: 0010:get_xsave_addr+0x9b/0xb0" stacktrace in early boot with -24 bullseye kernel)

Reply via email to