On Thu, May 18, 2017 at 10:07 +0200, Sebastien Marie wrote:
> On Fri, May 12, 2017 at 03:11:35PM +0000, Natasha Kerensikova wrote:
> > >Synopsis:  Suspend-to-disk doesn't work anymore
> > >Category:  <PR category (one line)>
> > >Environment:
> >     System      : OpenBSD 6.1
> >     Details     : OpenBSD 6.1-current (GENERIC.MP) #6: Fri May 12 15:12:39 
> > CEST 2017
> >                      
> > [email protected]:/data/semarie/repos/openbsd/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> >     Architecture: OpenBSD.amd64
> >     Machine     : amd64
> > >Description:
> >     On my Thinkpad X220 (with Core i5) with full disk encryption,
> >     OpenBSD doesn't resume after suspend to disk since my latest snanpshot
> >     update (May 7th snapshot). Keeping the same userland and using kernels
> >     helpfully provided by semarie, we bisected the problem to the commits
> >     detailed below.
> > >How-To-Repeat:
> >     Suspend-to-disk a live OpenBSD. On next boot, it should resume from
> >     disk, but instead it starts a standard boot with dirty filesystems.
> > >Fix:
> >     Reverting the commits identified on github mirror by the hashes
> >     d223d7cb85c1f2f705da547a0134b949655abe6a ("Switch glxsb(4), VIA
> >     padlock and AES-NI drivers over to the new AES") and
> >     cb3087542b323ec5bf5db9dc64f0d54dc40cca40 ("Switch OCF and IPsec over
> >     to the new AES") fixes the problem, at least until commit
> >     50f8ee3e5db5b40ae9a05db4742b05e8d975573d (May 11th).
> > 
> 
> With Natacha, we continued a bit a try to debug the problem.
>

Thank you for a follow up mail.  If you can find more info,
this would be helpful.

> By activating HIB_DEBUG, the resume showed that it failed due to wrong magic 
> number:
> 
> [...]
> sd1 at scsibus3 targ 1 lun 0: <OPENBSD, SR CRYPTO, 006> SCSI2 0/direct fixed
> sd1: 953866MB, 512 bytes/sector, 1953519473 sectors
> root on sd1a (63848a4fade4a944.a) swap on sd1b dump on sd1b
> reading hibernate signature block location: 8641783
> wrong magic number in hibernate signature: e82daa08
> 
> I am unsure the reason: it could be the hibernate part that don't write
> it correctly or the resume part that don't read it correctly ? I dunno.
> 
> By "correctly" I mean: wrong aes key ? use of uninitialised or garbaged
> struct ? Something that results a "bad state" on writing or reading.
> 
> 
> With the last commit to revert AES_XTS to rijndael, I pushed it on
> top of the tested tree (7 days old). The hibernate/resume works again.
> 
> It makes it to confirm the problem is related to the switch to
> constant-time-aes in the context of full-disk-encryption.
>

Thanks for verifying this.  I've looked through the sr_hibernate_io
(that's hib->io_func) but couldn't find anything wrong with it. The
only thing that springs to mind is that AES_CTX and therefore the
XTS context (aes_xts_ctx) is larger and requires more stack space.
Though I can't see what might be affected by that.

> Regarding the problem itself, I don't know enough the crypto part and
> the initialisation code path to figure the reason. Does aes.c has some
> initialisation that would arrive later than rijndael.c ? resulting a
> first read on disk with wrong key or uninitialised structure ? I dunno.

No.  Otherwise we would see this kind of issues elsewhere.

> I just hope this problem doesn't hide a more subtile underlined problem.
>

It probably does.

> I expect the problem to be fixed in next snapshot (a one including the revert
> of AES_XTS to rijndael).
> 
> Thanks.
> -- 
> Sebastien Marie
> 

Reply via email to