> Date: Sat, 3 Jan 2026 20:50:23 -0800
> From: Mike Larkin <[email protected]>
> 
> On Tue, Dec 30, 2025 at 05:20:46PM +0100, Mark Kettenis wrote:
> > > Date: Tue, 30 Dec 2025 07:46:16 +0100
> > > From: Rafael Sadowski <[email protected]>
> > >
> > > On Mon Dec 29, 2025 at 06:17:16PM -0800, [email protected] wrote:
> > > >    I have the same machine and it works fine also, or at least it did 
> > > > last
> > > >    time I tried.
> > > >    Does it work if you ZZZ from the text console, right after boot?
> > > >    -ml
> > >
> > > Yes and no. Instead of getting stuck in the kernel boot I ends up in a
> > > wired white artefact screen and then the only thing that helps is a hard
> > > reset.
> > >
> > > I also reset my BIOS settings to factory defaults. No changes except
> > > that my OpenBSD EFI boot entry was gone.
> > >
> > > Perhaps something with the GPU:
> > >
> > > dmesg| grep amd
> > >     [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > amdpmc0 at acpi0: PEP_
> > > amdpmc0: SMU program 0 version 76.93.0
> > > amdgpio0 at acpi0 GPIO uid 0 addr 0xfed81500/0x400 irq 7, 184 pins
> > > amdgpu0 at pci6 dev 0 function 0 "ATI Hawk Point" rev 0xd0
> > > drm0 at amdgpu0
> > > amdgpu0: msi
> > > amdgpu0: IP DISCOVERY GC 11.0.1 12 CU rev 0x0c
> > > amdgpu0: 1920x1200, 32bpp
> > > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0
> >
> > My X13 Gen 4 AMD has essentially the same GPU:
> >
> > amdgpu0 at pci5 dev 0 function 0 "ATI Phoenix" rev 0xdd
> > drm0 at amdgpu0
> > amdgpu0: msi
> > amdgpu0: IP DISCOVERY GC 11.0.1 12 CU rev 0x09
> > amdgpu0: 1920x1200, 32bpp
> >
> > Hibernate "works" on this machine but:
> >
> > * After unhibernate, the framebuffer is filled with random crap; we
> >   probably need to clear it in the driver somewhere.
> >
> > * After unhibernate, qwx(4) is somewhat hosed.  It works, but if you
> >   try to down the interface, it hangs.  It seems that the "head
> >   pointer" for one of the ring gets corrupted and this makes the
> >   driver go into an infinite loop.  I can break into that loop using
> >   CTRL-ALT-ESC though (sysctl ddb.console=1).  I'm investigating this
> >   issue.
> >
> > * Sometimes I get a kernel that always produces a
> >
> >     "unhibernate failed: original kernel changed"
> >
> >   message.
> >
> 
> Some comments -
> 
> 1. if unhibernate tries to unhibernate but fails (wrong kernel, etc), you are
>    certainly going to have a hosed machine. This is because the unhibernating
>    kernel is booting in a neutered mode where a bunch of devices are disabled,
>    as well as all the APs. At best, this leads to a weird experience; at 
> worst,
>    things hang or crash later. Theo and I have discussed what we should do in
>    this case, since there is no way to rewind autoconf and "retry". I 
> suggested
>    just rebooting; theo suggested maybe some informational panic message. I'm
>    not sure if this is what you are seeing in any of the above cases, but I
>    wanted to point that out.

I'm obviously seeing this when I get the "original kernel changed"
failure.  I was somewhat confused why I couldn't ssh into the machine
at first, but yes, I realized that we booted without qwx(4) and from
then on just reboot when I end up in this case.

> 2. regarding the "original kernel changed" - the only way this happens if you
>    booted, changed your kernel, then hibernated. there is no other way I can
>    see this happening. I have done this in the past and seen the same:
> 
>     a. boot machine
>     b. someone asks me to test a diff, or I'm testing a diff of my own
>     c. build and install new kernel, but not ready for reboot yet (doing other
>        things)
>     d. forget I installed a new kernel and ZZZ
>     e. reboot, unhibernate prints that message.
> 
> The code that does this signature check is:
> 
>         SHA256Init(&ctx);
>         SHA256Update(&ctx, version, strlen(version));
>         fn = printf;
>         SHA256Update(&ctx, &fn, sizeof(fn));
>         fn = malloc;
>         SHA256Update(&ctx, &fn, sizeof(fn));
>         fn = km_alloc;
>         SHA256Update(&ctx, &fn, sizeof(fn));
>         fn = strlen;
>         SHA256Update(&ctx, &fn, sizeof(fn));
>         SHA256Final((u_int8_t *)&hib->kern_hash, &ctx);
> 
> ... so it just fingerprints a bunch of things and then does a sha256 compare
> on unpack.
> 
> I don't know how to prevent that footgun however, aside from moving all the
> signature checking up into the bootloader and not even attempting the 
> unhibernate
> if we see this situation. That doesn't "fix" the problem but at least you 
> aren't
> running on some halfway-autoconf'ed kernel when it fails. Moving this stuff 
> into
> the bootloader is not trivial; I tried this in 2010-2011 and gave up.
> 
> Regarding the other things (device issues, hangs, etc), I have some ideas on 
> how
> to potentially print more information but it needs to be coded.

I'm 100% sure that I am booting the correct kernel.  The checksum
calculated by that code above is the same.  But for some reason the
checksum that we read back from the hibernation info on disk is
all-zeroes.  So something is going wrong.  Will dig deeper when I have
time.

Reply via email to