On Fri, Dec 31, 2021 at 09:52:58PM -0800, [email protected] wrote:
> On Sun, 26 Dec 2021, Philip Guenther wrote:
> > Installed snap from Friday on my X1 extreme and it's no longer able to
> > resume from hibernation, even when hibernation was done right after
> > boot+login, showing 16 "hibernate_block_io open failed" before showing
> >     unhibernating @ block 47965181 length 6263601556133MB
> >
> >     Unable to resume hibernated image
> >
> > That length seems completely bogus, of course.
> >
> > If I'm reading my /var/log/daemon and /var/log/messages correctly, my
> > successful resume on Dec 25th was with a kernel I built on Dec 3rd.  :-/
>
> Okay, figured it out: it _was_ caused by the change to not attach various
> devices when unhibernating.  That changed the device under which softraid
> reattached my encrypted boot device from sd3 to sd2:
>       Dec 31 14:04:36 bleys /bsd: softraid0 at root
>       Dec 31 14:04:36 bleys /bsd: scsibus4 at softraid0: 256 targets
>       Dec 31 14:04:36 bleys /bsd: sd2 at scsibus4 targ 1 lun 0: <OPENBSD, SR 
> CRYPTO, 006>
>       Dec 31 14:04:36 bleys /bsd: sd2: 244197MB, 512 bytes/sector, 500116577 
> sectors
>       Dec 31 14:04:36 bleys /bsd: softraid0: volume sd2 is roaming, it used 
> to be sd3, updating metadata
>       Dec 31 14:04:36 bleys /bsd: root on sd2a (8ddcca7f6e4dca69.a) swap on 
> sd2b dump on sd2b
>

Likely caused by originally booting with a umass plugged in, then ZZZ'ing,
then un-ZZZ'ing (which doesnt attach umass and thus reorders sd*).

> The bug is that the hibernate resume logic would read the signature from
> the correct device, but then use the device recorded in that to try to
> read the rest:
>       Dec 31 14:04:36 bleys /bsd: hibernate_block_io open failed
>       Dec 31 14:04:36 bleys last message repeated 15 times
>       Dec 31 14:04:36 bleys /bsd: unhibernating @ block 47965181 length 
> 6263601556133MB
>       Dec 31 14:04:36 bleys /bsd: unhibernating @ block 47965181 length 
> 6263601556133MB
>       Dec 31 14:04:36 bleys /bsd: Unable to resume hibernated image
>
>
> The fix is a literal one-liner: use the device we read the signature from
> for the entire resume.
>
> Index: kern/subr_hibernate.c
> ===================================================================
> RCS file: /data/src/openbsd/src/sys/kern/subr_hibernate.c,v
> retrieving revision 1.129
> diff -u -p -r1.129 subr_hibernate.c
> --- kern/subr_hibernate.c     31 Aug 2021 14:45:25 -0000      1.129
> +++ kern/subr_hibernate.c     1 Jan 2022 05:18:21 -0000
> @@ -1173,6 +1173,7 @@ hibernate_resume(void)
>               splx(s);
>               return;
>       }
> +     disk_hib.dev = hib.dev;
>
>  #ifdef MULTIPROCESSOR
>       /* XXX - if we fail later, we may need to rehatch APs on some archs */
>
>
> Resume works with that.  Well, 'mostly': I've seen a couple "freed pool
> modified" panics during resume, where it's back on the resumed kernel and
> actually drops into ddb.  The second time I at least noted the pool:
> dma32768...which makes me think some device isn't being correctly handled
> after the "don't attach everyone on unhibernate" change.  :-|
>
> I'll try to gather more data, but at least the change above seems clearly
> correct.
>
>
> Philip Guenther
>

sure, ok mlarkin

Reply via email to