On Sun, Jan 04, 2026 at 02:01:07PM +0100, Mark Kettenis wrote: > > Date: Sat, 3 Jan 2026 20:50:23 -0800 > > From: Mike Larkin <[email protected]> > > > > On Tue, Dec 30, 2025 at 05:20:46PM +0100, Mark Kettenis wrote: > > > > Date: Tue, 30 Dec 2025 07:46:16 +0100 > > > > From: Rafael Sadowski <[email protected]> > > > > > > > > On Mon Dec 29, 2025 at 06:17:16PM -0800, [email protected] wrote: > > > > > I have the same machine and it works fine also, or at least it did > > > > > last > > > > > time I tried. > > > > > Does it work if you ZZZ from the text console, right after boot? > > > > > -ml > > > > > > > > Yes and no. Instead of getting stuck in the kernel boot I ends up in a > > > > wired white artefact screen and then the only thing that helps is a hard > > > > reset. > > > > > > > > I also reset my BIOS settings to factory defaults. No changes except > > > > that my OpenBSD EFI boot entry was gone. > > > > > > > > Perhaps something with the GPU: > > > > > > > > dmesg| grep amd > > > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > amdpmc0 at acpi0: PEP_ > > > > amdpmc0: SMU program 0 version 76.93.0 > > > > amdgpio0 at acpi0 GPIO uid 0 addr 0xfed81500/0x400 irq 7, 184 pins > > > > amdgpu0 at pci6 dev 0 function 0 "ATI Hawk Point" rev 0xd0 > > > > drm0 at amdgpu0 > > > > amdgpu0: msi > > > > amdgpu0: IP DISCOVERY GC 11.0.1 12 CU rev 0x0c > > > > amdgpu0: 1920x1200, 32bpp > > > > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using > > > > wskbd0 > > > > > > My X13 Gen 4 AMD has essentially the same GPU: > > > > > > amdgpu0 at pci5 dev 0 function 0 "ATI Phoenix" rev 0xdd > > > drm0 at amdgpu0 > > > amdgpu0: msi > > > amdgpu0: IP DISCOVERY GC 11.0.1 12 CU rev 0x09 > > > amdgpu0: 1920x1200, 32bpp > > > > > > Hibernate "works" on this machine but: > > > > > > * After unhibernate, the framebuffer is filled with random crap; we > > > probably need to clear it in the driver somewhere. > > > > > > * After unhibernate, qwx(4) is somewhat hosed. It works, but if you > > > try to down the interface, it hangs. It seems that the "head > > > pointer" for one of the ring gets corrupted and this makes the > > > driver go into an infinite loop. I can break into that loop using > > > CTRL-ALT-ESC though (sysctl ddb.console=1). I'm investigating this > > > issue. > > > > > > * Sometimes I get a kernel that always produces a > > > > > > "unhibernate failed: original kernel changed" > > > > > > message. > > > > > > > Some comments - > > > > 1. if unhibernate tries to unhibernate but fails (wrong kernel, etc), you > > are > > certainly going to have a hosed machine. This is because the > > unhibernating > > kernel is booting in a neutered mode where a bunch of devices are > > disabled, > > as well as all the APs. At best, this leads to a weird experience; at > > worst, > > things hang or crash later. Theo and I have discussed what we should do > > in > > this case, since there is no way to rewind autoconf and "retry". I > > suggested > > just rebooting; theo suggested maybe some informational panic message. > > I'm > > not sure if this is what you are seeing in any of the above cases, but I > > wanted to point that out. > > I'm obviously seeing this when I get the "original kernel changed" > failure. I was somewhat confused why I couldn't ssh into the machine > at first, but yes, I realized that we booted without qwx(4) and from > then on just reboot when I end up in this case. > > > 2. regarding the "original kernel changed" - the only way this happens if > > you > > booted, changed your kernel, then hibernated. there is no other way I can > > see this happening. I have done this in the past and seen the same: > > > > a. boot machine > > b. someone asks me to test a diff, or I'm testing a diff of my own > > c. build and install new kernel, but not ready for reboot yet (doing > > other > > things) > > d. forget I installed a new kernel and ZZZ > > e. reboot, unhibernate prints that message. > > > > The code that does this signature check is: > > > > SHA256Init(&ctx); > > SHA256Update(&ctx, version, strlen(version)); > > fn = printf; > > SHA256Update(&ctx, &fn, sizeof(fn)); > > fn = malloc; > > SHA256Update(&ctx, &fn, sizeof(fn)); > > fn = km_alloc; > > SHA256Update(&ctx, &fn, sizeof(fn)); > > fn = strlen; > > SHA256Update(&ctx, &fn, sizeof(fn)); > > SHA256Final((u_int8_t *)&hib->kern_hash, &ctx); > > > > ... so it just fingerprints a bunch of things and then does a sha256 compare > > on unpack. > > > > I don't know how to prevent that footgun however, aside from moving all the > > signature checking up into the bootloader and not even attempting the > > unhibernate > > if we see this situation. That doesn't "fix" the problem but at least you > > aren't > > running on some halfway-autoconf'ed kernel when it fails. Moving this stuff > > into > > the bootloader is not trivial; I tried this in 2010-2011 and gave up. > > > > Regarding the other things (device issues, hangs, etc), I have some ideas > > on how > > to potentially print more information but it needs to be coded. > > I'm 100% sure that I am booting the correct kernel. The checksum > calculated by that code above is the same. But for some reason the > checksum that we read back from the hibernation info on disk is > all-zeroes. So something is going wrong. Will dig deeper when I have > time.
just following up here - so you obviously checked this. are you saying that the *checksum* is zero but somehow the magic number at the start of the signature block is still valid, as well as the memory range data/etc? I'm trying to understand if the entire signature block is zeros or *just* the kernel checksum. If it's entirely zero, including the field for the magic number, then the problem lies in the bootloader somehow thinking it's an unhibernate when it really isn't. If the signature block is properly "there" but with an all-zero kernel checksum, then the problem is in the code that calculated that and wrote it out when the ZZZ happened. Ideas? -ml
