Package: pm-utils
Version: 1.4.1-15
Severity: normal

resuming from a hibernation swap partition after the root disk was
mounted inbetween can result in file system corruption.

this doesn't happen when following the debian-installer's recommended
setup and not dual-booting, but still can happen in practical setups.


steps to reproduce
------------------

as far as i can tell, this resuming with anything happening on the disks
is not *expected* to go right, but i'll show that this can happen
inadvertedly in some setups:

* run debian 8.2 installer on a virtual machine (no gui required):
  create a /boot partition, a luks encrypted partition for swap, and a
  luks encrypted partition for / with btrfs on top.
* boot up (as expected, you're asked for the root and swap passwords)
* install pm-utils
* pm-hibernate
* (this is a good point in time to snapshot the vm)
* boot up, but mistype the swap password three times in a row.
* when systemd asks for a new swap password, wait for it to timeout.
* log in, maybe touch a new file in /, and shut down.
* boot up again, now entering the passwords correctly.
* wait for the file system corruption that has ensued to bite you, or
  provoke it like this:

> $ ls /
> (no error, but a file touched during the boot without resuming doesn't show 
> up)
> # btrfs scrub start /
> scrub started on /, fsid ... (pid=...)
> [ ... ] BTRFS: bdev /dev/mapper/sda3_crypt errrs: wr 0, rd 0, flush 0, 
> corrupt 0, gen 1
> [ ... ] BTRFS: bdev /dev/mapper/sda3_crypt errrs: wr 0, rd 0, flush 0, 
> corrupt 0, gen 1
> [ ... ] BTRFS: unable to fixup (regular) error at logical 13090816 on dev 
> /dev/mapper/sda3_crypt
> # sudo btrfs scrub status /
> scrub status for ...
>         scrub started at ..., running for 10 seconds
>         total bytes scrubbed: 59.10MiB with 3 errors
>         error details: super=2 csum=1
>         corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
> ERROR: There are uncorrectable errors.

(that was my reproducing setup. i've observed the same issues on my sid
production machine with kernel 4.1 as well).


other situations affected
-------------------------

apart from encrypted volumes, such a situation is assumed to occur with
other setups as well:

1) swap on external device with bad connection (eg. small fanless system
   with partitions on usb sticks or desktop with esata hard disks)

2) picking a wrong kernel on resuming that doesn't mount the swap
   partition

3) booting another os inbetween (that's an obvious then-don't-do-that for
   people who know what suspend-to-disk does, 


approaches
----------

* hide bootloader after hibernating[1]. this only solves 2) and 3),
  provided that grub is the actually employed means of selecting the os.

* don't allow the prompt for a new password to expire (doesn't solve any
  of 1)/2)/3), and requires /etc/crypttab to contain the swap option,
  which might not be present on older systems)

[1] https://bugs.launchpad.net/ubuntu/+source/pm-utils/+bug/42376


how do you think such problems can be avoided? can pm-utils solve this,
or is help from the kernel / file system drivers required?


best regards
chrysn

Reply via email to