> Date: Thu, 8 Jan 2026 14:37:43 -0800 > From: Mike Larkin <[email protected]> > > On Sun, Jan 04, 2026 at 02:01:07PM +0100, Mark Kettenis wrote: > > > Date: Sat, 3 Jan 2026 20:50:23 -0800 > > > From: Mike Larkin <[email protected]> > > > > > > On Tue, Dec 30, 2025 at 05:20:46PM +0100, Mark Kettenis wrote: > > > > > Date: Tue, 30 Dec 2025 07:46:16 +0100 > > > > > From: Rafael Sadowski <[email protected]> > > > > > > > > > > On Mon Dec 29, 2025 at 06:17:16PM -0800, [email protected] wrote: > > > > > > I have the same machine and it works fine also, or at least it > > > > > > did last > > > > > > time I tried. > > > > > > Does it work if you ZZZ from the text console, right after boot? > > > > > > -ml > > > > > > > > > > Yes and no. Instead of getting stuck in the kernel boot I ends up in a > > > > > wired white artefact screen and then the only thing that helps is a > > > > > hard > > > > > reset. > > > > > > > > > > I also reset my BIOS settings to factory defaults. No changes except > > > > > that my OpenBSD EFI boot entry was gone. > > > > > > > > > > Perhaps something with the GPU: > > > > > > > > > > dmesg| grep amd > > > > > > > > > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > > amdpmc0 at acpi0: PEP_ > > > > > amdpmc0: SMU program 0 version 76.93.0 > > > > > amdgpio0 at acpi0 GPIO uid 0 addr 0xfed81500/0x400 irq 7, 184 pins > > > > > amdgpu0 at pci6 dev 0 function 0 "ATI Hawk Point" rev 0xd0 > > > > > drm0 at amdgpu0 > > > > > amdgpu0: msi > > > > > amdgpu0: IP DISCOVERY GC 11.0.1 12 CU rev 0x0c > > > > > amdgpu0: 1920x1200, 32bpp > > > > > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using > > > > > wskbd0 > > > > > > > > My X13 Gen 4 AMD has essentially the same GPU: > > > > > > > > amdgpu0 at pci5 dev 0 function 0 "ATI Phoenix" rev 0xdd > > > > drm0 at amdgpu0 > > > > amdgpu0: msi > > > > amdgpu0: IP DISCOVERY GC 11.0.1 12 CU rev 0x09 > > > > amdgpu0: 1920x1200, 32bpp > > > > > > > > Hibernate "works" on this machine but: > > > > > > > > * After unhibernate, the framebuffer is filled with random crap; we > > > > probably need to clear it in the driver somewhere. > > > > > > > > * After unhibernate, qwx(4) is somewhat hosed. It works, but if you > > > > try to down the interface, it hangs. It seems that the "head > > > > pointer" for one of the ring gets corrupted and this makes the > > > > driver go into an infinite loop. I can break into that loop using > > > > CTRL-ALT-ESC though (sysctl ddb.console=1). I'm investigating this > > > > issue. > > > > > > > > * Sometimes I get a kernel that always produces a > > > > > > > > "unhibernate failed: original kernel changed" > > > > > > > > message. > > > > > > > > > > Some comments - > > > > > > 1. if unhibernate tries to unhibernate but fails (wrong kernel, etc), you > > > are > > > certainly going to have a hosed machine. This is because the > > > unhibernating > > > kernel is booting in a neutered mode where a bunch of devices are > > > disabled, > > > as well as all the APs. At best, this leads to a weird experience; at > > > worst, > > > things hang or crash later. Theo and I have discussed what we should > > > do in > > > this case, since there is no way to rewind autoconf and "retry". I > > > suggested > > > just rebooting; theo suggested maybe some informational panic message. > > > I'm > > > not sure if this is what you are seeing in any of the above cases, but > > > I > > > wanted to point that out. > > > > I'm obviously seeing this when I get the "original kernel changed" > > failure. I was somewhat confused why I couldn't ssh into the machine > > at first, but yes, I realized that we booted without qwx(4) and from > > then on just reboot when I end up in this case. > > > > > 2. regarding the "original kernel changed" - the only way this happens if > > > you > > > booted, changed your kernel, then hibernated. there is no other way I > > > can > > > see this happening. I have done this in the past and seen the same: > > > > > > a. boot machine > > > b. someone asks me to test a diff, or I'm testing a diff of my own > > > c. build and install new kernel, but not ready for reboot yet (doing > > > other > > > things) > > > d. forget I installed a new kernel and ZZZ > > > e. reboot, unhibernate prints that message. > > > > > > The code that does this signature check is: > > > > > > SHA256Init(&ctx); > > > SHA256Update(&ctx, version, strlen(version)); > > > fn = printf; > > > SHA256Update(&ctx, &fn, sizeof(fn)); > > > fn = malloc; > > > SHA256Update(&ctx, &fn, sizeof(fn)); > > > fn = km_alloc; > > > SHA256Update(&ctx, &fn, sizeof(fn)); > > > fn = strlen; > > > SHA256Update(&ctx, &fn, sizeof(fn)); > > > SHA256Final((u_int8_t *)&hib->kern_hash, &ctx); > > > > > > ... so it just fingerprints a bunch of things and then does a sha256 > > > compare > > > on unpack. > > > > > > I don't know how to prevent that footgun however, aside from moving all > > > the > > > signature checking up into the bootloader and not even attempting the > > > unhibernate > > > if we see this situation. That doesn't "fix" the problem but at least you > > > aren't > > > running on some halfway-autoconf'ed kernel when it fails. Moving this > > > stuff into > > > the bootloader is not trivial; I tried this in 2010-2011 and gave up. > > > > > > Regarding the other things (device issues, hangs, etc), I have some ideas > > > on how > > > to potentially print more information but it needs to be coded. > > > > I'm 100% sure that I am booting the correct kernel. The checksum > > calculated by that code above is the same. But for some reason the > > checksum that we read back from the hibernation info on disk is > > all-zeroes. So something is going wrong. Will dig deeper when I have > > time. > > just following up here - > > so you obviously checked this. are you saying that the *checksum* is zero but > somehow the magic number at the start of the signature block is still valid, > as well as the memory range data/etc? > > I'm trying to understand if the entire signature block is zeros or *just* the > kernel checksum. > > If it's entirely zero, including the field for the magic number, > then the problem lies in the bootloader somehow thinking it's an > unhibernate when it really isn't. > > If the signature block is properly "there" but with an all-zero > kernel checksum, then the problem is in the code that calculated > that and wrote it out when the ZZZ happened. > > Ideas?
I'm trying to narrow things down a bit further. But after adding some debug printfs, I haven't been able to reproduce the issue :(.
