On Fri, Dec 31, 2021 at 09:52:58PM -0800, [email protected] wrote: > On Sun, 26 Dec 2021, Philip Guenther wrote: > > Installed snap from Friday on my X1 extreme and it's no longer able to > > resume from hibernation, even when hibernation was done right after > > boot+login, showing 16 "hibernate_block_io open failed" before showing > > unhibernating @ block 47965181 length 6263601556133MB > > > > Unable to resume hibernated image > > > > That length seems completely bogus, of course. > > > > If I'm reading my /var/log/daemon and /var/log/messages correctly, my > > successful resume on Dec 25th was with a kernel I built on Dec 3rd. :-/ > > Okay, figured it out: it _was_ caused by the change to not attach various > devices when unhibernating. That changed the device under which softraid > reattached my encrypted boot device from sd3 to sd2: > Dec 31 14:04:36 bleys /bsd: softraid0 at root > Dec 31 14:04:36 bleys /bsd: scsibus4 at softraid0: 256 targets > Dec 31 14:04:36 bleys /bsd: sd2 at scsibus4 targ 1 lun 0: <OPENBSD, SR > CRYPTO, 006> > Dec 31 14:04:36 bleys /bsd: sd2: 244197MB, 512 bytes/sector, 500116577 > sectors > Dec 31 14:04:36 bleys /bsd: softraid0: volume sd2 is roaming, it used > to be sd3, updating metadata > Dec 31 14:04:36 bleys /bsd: root on sd2a (8ddcca7f6e4dca69.a) swap on > sd2b dump on sd2b >
Likely caused by originally booting with a umass plugged in, then ZZZ'ing, then un-ZZZ'ing (which doesnt attach umass and thus reorders sd*). > The bug is that the hibernate resume logic would read the signature from > the correct device, but then use the device recorded in that to try to > read the rest: > Dec 31 14:04:36 bleys /bsd: hibernate_block_io open failed > Dec 31 14:04:36 bleys last message repeated 15 times > Dec 31 14:04:36 bleys /bsd: unhibernating @ block 47965181 length > 6263601556133MB > Dec 31 14:04:36 bleys /bsd: unhibernating @ block 47965181 length > 6263601556133MB > Dec 31 14:04:36 bleys /bsd: Unable to resume hibernated image > > > The fix is a literal one-liner: use the device we read the signature from > for the entire resume. > > Index: kern/subr_hibernate.c > =================================================================== > RCS file: /data/src/openbsd/src/sys/kern/subr_hibernate.c,v > retrieving revision 1.129 > diff -u -p -r1.129 subr_hibernate.c > --- kern/subr_hibernate.c 31 Aug 2021 14:45:25 -0000 1.129 > +++ kern/subr_hibernate.c 1 Jan 2022 05:18:21 -0000 > @@ -1173,6 +1173,7 @@ hibernate_resume(void) > splx(s); > return; > } > + disk_hib.dev = hib.dev; > > #ifdef MULTIPROCESSOR > /* XXX - if we fail later, we may need to rehatch APs on some archs */ > > > Resume works with that. Well, 'mostly': I've seen a couple "freed pool > modified" panics during resume, where it's back on the resumed kernel and > actually drops into ddb. The second time I at least noted the pool: > dma32768...which makes me think some device isn't being correctly handled > after the "don't attach everyone on unhibernate" change. :-| > > I'll try to gather more data, but at least the change above seems clearly > correct. > > > Philip Guenther > sure, ok mlarkin
