Hi. Hibernate to swap located on dm-integrity doesn't work. Let me first describe why I need this, then I will describe a bug with steps to reproduce (and some speculation on cause of the bug).
I want a personal Linux laptop fully protected from data corruption (cosmic rays, etc). And even if data corruption does happen, I want reliable indication of that, so that I know that I need to restore from backups and replace faulty hardware. So I did this: - I bought a laptop with ECC memory: Dell Precision 7780. It seems to be one of the few laptop models in the world with ECC memory. And probably the only model in the world, which DOES HAVE ECC memory, but DOES NOT HAVE nvidia card - I set up btrfs raid-1 So far, so good. I'm protected from both memory errors and disk errors. But my swap partition stays unprotected. And yes, I need swap. I have 64 GiB of RAM, but this still is not enough for zillions of Chromium tabs and vscode windows. And, according to btrfs docs, if I put swapfile to btrfs, it will be exempt from raid-1 guarantees. It seems that the only solution in this case is to put swap partition on top of dm-integrity. Note: I don't need error correcting code for my swap. I don't need a self-healing swap. The only thing I need is reliable error detection. There is discussion on Stack Exchange on exactly the same problem: https://unix.stackexchange.com/questions/269098/silent-disk-errors-and-reliability-of-linux-swap . I think it reached the same conclusion: the only solution is dm-integrity. Also, as well as I understand, md raid is not a solution: when reading, it reads from one disk only, and thus doesn't detect all errors. It can detect remaining errors during scrub, which is too late (wrong data may be already consumed by some app). Also: I don't need encryption. (Also: there is other solution: "cryptsetup --integrity", but it uses dm-integrity anyway. We will get to it.) Okay, so I put swap partition to dm-integrity, and it worked! But then hibernation stopped to work. And here come steps to reproduce. Okay, so I have Dell Precision 7780. I bought it year ago, so I don't think my hardware is faulty. Also, I recently updated BIOS. My OS is Debian Trixie amd64. My kernel is Linux 6.12.48-1 from Debian. I created swap partition so: integritysetup format --integrity xxhash64 /dev/disk/by-partuuid/c4bbc73d-7909-42ea-8d96-eab82512cbe7 integritysetup open --integrity xxhash64 /dev/disk/by-partuuid/c4bbc73d-7909-42ea-8d96-eab82512cbe7 swap mkswap /dev/mapper/swap swapon /dev/mapper/swap When I need to activate swap, I do this: integritysetup open --integrity xxhash64 /dev/disk/by-partuuid/c4bbc73d-7909-42ea-8d96-eab82512cbe7 swap swapon /dev/mapper/swap When I need to hibernate, I do "systemctl hibernate". And hibernate appears to work. Then, when I need to resume, I boot to my hand-crafted initramfs. That initramfs does this (I slightly simplified this script): == busybox mount -t proc proc /proc busybox mount -t devtmpfs devtmpfs /dev busybox mount -t sysfs sysfs /sys modprobe nvme modprobe dm-integrity modprobe xxhash64 sleep 1 integritysetup open --integrity xxhash64 "$LOWER_SWAP_DEV" early-swap sleep 1 # The following "blkid" command should detect what is present on /dev/mapper/early-swap TYPE="$(blkid --match-tag TYPE --output value /dev/mapper/early-swap)" if [ "$TYPE" = 'swsuspend' ]; then echo "got hibernation image, trying to resume" echo /dev/mapper/early-swap > /sys/power/resume elif [ "$TYPE" = 'swap' ]; then echo 'got normal swap without hibernation image' integritysetup close early-swap # proceed with fresh boot here fi == And this doesn't work. Hibernate works, resume doesn't. :) "blkid" reports swap as "swap" as opposed to "swsuspend". I suspect this is because hibernation doesn't flush dm-integrity journal. Also I tried to add "--integrity-bitmap-mode" to "format" and "open". Resume started to work, but when I try to shutdown resumed system, I get errors about corrupted dm-integrity partition. (Of course, I did necessary edits to initramfs script above.) Also I tried to add "--integrity-no-journal" to "format" and "open". It didn't work, either. (I don't remember what exactly didn't work. I can do this experiment again, if needed.) Then I tried to do "cryptsetup" instead of "integritysetup". I created swap partition so: cryptsetup luksFormat --type luks2 /dev/disk/by-partuuid/c4bbc73d-7909-42ea-8d96-eab82512cbe7 /tmp/key cryptsetup open --type luks2 --key-file /tmp/key /dev/disk/by-partuuid/c4bbc73d-7909-42ea-8d96-eab82512cbe7 swap mkswap /dev/mapper/swap swapon /dev/mapper/swap And, of course, I did all necessary edits to initramfs. And this time everything worked. This proves that I didn't do any mistakes in my setup (i. e. I got initramfs right, etc), and this is actual bug in dm-integrity. Unfortunately, LUKS created such way doesn't have any redundancy. So this is not solution for me. So I did this: cryptsetup luksFormat --type luks2 --integrity hmac-sha256 /dev/disk/by-partuuid/c4bbc73d-7909-42ea-8d96-eab82512cbe7 /tmp/key cryptsetup open --type luks2 --key-file /tmp/key /dev/disk/by-partuuid/c4bbc73d-7909-42ea-8d96-eab82512cbe7 swap mkswap /dev/mapper/swap swapon /dev/mapper/swap And this time everything stopped to work, again. I don't remember what exactly went wrong. As well as I remember, that "blkid" again returned "swap" instead of "swsuspend". I can run experiment again, if needed. The commands above use dm-integrity internally. So we clearly see: if dm-integrity is involved, then hibernation doesn't work. Here is a user with exactly same problem: https://www.reddit.com/r/archlinux/comments/atg18t/hibernation_wipes_swap_and_my_system_hangs_on_boot/ . I. e. "hibernation doesn't work if swap is on LUKS with integrity protection". So, please, fix this bug. Or say me how to solve my original problem (i. e. achieving full reliable error reporting). I'm available for testing. Send me experimental patches. I can provide more info, if needed. There is yet another potential solution: uswsusp ( https://docs.kernel.org/power/userland-swsusp.html ). In short, this is hibernation driven from userspace. I. e. uswsusp allows for finer control of hibernation. Thus, I can write my own userspace util, which will do hibernation, and execute "integritysetup close" after writing image. Assuming that original problem is what I think (i. e. lack of journal flush after writing of image), this may work. The problem is... latest commit to userspace implementation is dated 2012-09-15 ( https://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-utils.git/ ). Do I really need to use such ancient technology? I didn't test this yet, I will test it in coming days. Even if it works, this will still mean that dm-integrity is buggy with kernel-based hibernation. -- Askar Safin
