On Thu, May 18, 2017 at 10:07 +0200, Sebastien Marie wrote: > On Fri, May 12, 2017 at 03:11:35PM +0000, Natasha Kerensikova wrote: > > >Synopsis: Suspend-to-disk doesn't work anymore > > >Category: <PR category (one line)> > > >Environment: > > System : OpenBSD 6.1 > > Details : OpenBSD 6.1-current (GENERIC.MP) #6: Fri May 12 15:12:39 > > CEST 2017 > > > > [email protected]:/data/semarie/repos/openbsd/src/sys/arch/amd64/compile/GENERIC.MP > > > > Architecture: OpenBSD.amd64 > > Machine : amd64 > > >Description: > > On my Thinkpad X220 (with Core i5) with full disk encryption, > > OpenBSD doesn't resume after suspend to disk since my latest snanpshot > > update (May 7th snapshot). Keeping the same userland and using kernels > > helpfully provided by semarie, we bisected the problem to the commits > > detailed below. > > >How-To-Repeat: > > Suspend-to-disk a live OpenBSD. On next boot, it should resume from > > disk, but instead it starts a standard boot with dirty filesystems. > > >Fix: > > Reverting the commits identified on github mirror by the hashes > > d223d7cb85c1f2f705da547a0134b949655abe6a ("Switch glxsb(4), VIA > > padlock and AES-NI drivers over to the new AES") and > > cb3087542b323ec5bf5db9dc64f0d54dc40cca40 ("Switch OCF and IPsec over > > to the new AES") fixes the problem, at least until commit > > 50f8ee3e5db5b40ae9a05db4742b05e8d975573d (May 11th). > > > > With Natacha, we continued a bit a try to debug the problem. >
Thank you for a follow up mail. If you can find more info, this would be helpful. > By activating HIB_DEBUG, the resume showed that it failed due to wrong magic > number: > > [...] > sd1 at scsibus3 targ 1 lun 0: <OPENBSD, SR CRYPTO, 006> SCSI2 0/direct fixed > sd1: 953866MB, 512 bytes/sector, 1953519473 sectors > root on sd1a (63848a4fade4a944.a) swap on sd1b dump on sd1b > reading hibernate signature block location: 8641783 > wrong magic number in hibernate signature: e82daa08 > > I am unsure the reason: it could be the hibernate part that don't write > it correctly or the resume part that don't read it correctly ? I dunno. > > By "correctly" I mean: wrong aes key ? use of uninitialised or garbaged > struct ? Something that results a "bad state" on writing or reading. > > > With the last commit to revert AES_XTS to rijndael, I pushed it on > top of the tested tree (7 days old). The hibernate/resume works again. > > It makes it to confirm the problem is related to the switch to > constant-time-aes in the context of full-disk-encryption. > Thanks for verifying this. I've looked through the sr_hibernate_io (that's hib->io_func) but couldn't find anything wrong with it. The only thing that springs to mind is that AES_CTX and therefore the XTS context (aes_xts_ctx) is larger and requires more stack space. Though I can't see what might be affected by that. > Regarding the problem itself, I don't know enough the crypto part and > the initialisation code path to figure the reason. Does aes.c has some > initialisation that would arrive later than rijndael.c ? resulting a > first read on disk with wrong key or uninitialised structure ? I dunno. No. Otherwise we would see this kind of issues elsewhere. > I just hope this problem doesn't hide a more subtile underlined problem. > It probably does. > I expect the problem to be fixed in next snapshot (a one including the revert > of AES_XTS to rijndael). > > Thanks. > -- > Sebastien Marie >
