Hi Dimitri,
while debugging this I found the following in setup_unionfs() in
scripts/casper:
# move the first mount; no head in busybox-initramfs
for d in $(mount -t squashfs | cut -d\ -f 3); do
mkdir -p "${rootmnt}/rofs"
if [ "${UNIONFS}" = unionfs-fuse ]; then
mount -o bind "${d}" "${rootmnt}/rofs"
else
mount -o move "${d}" "${rootmnt}/rofs"
fi
break
done
and looking at the debug /run/initramfs/initramfs.debug log for the
above stanza I see:
+ cut '-d ' -f 3
+ mount -t squashfs
+ mkdir -p /root/rofs
+ '[' overlay '=' unionfs-fuse ]
+ mount -o move /filesystem.squashfs /root/rofs
+ break
however, when I cannot reproduce this mount -o move operation by hand as
I get the mount error:
mount: /root/rofs: /filesystem.squashfs is not a block device.
It appears to me that the scripts/casper mount seems to silently ignore
this failure. Should the mount be a bind mount instead?
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1824407
Title:
remount of multilower moved pivoted-root overlayfs root, results in
I/O errors on some modified files
Status in linux package in Ubuntu:
Confirmed
Status in linux-hwe package in Ubuntu:
Invalid
Status in linux-hwe source package in Bionic:
Confirmed
Bug description:
1) Download focal subiquity pending image, or eoan release image
2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI)
3) After --- insert the following options
break=top debug init=/bin/bash
4) Continue boot (Enter in BIOS, ctrl+x in UEFI)
5) in the initramfs execute:
rm /scripts/casper-bottom/25adduser
exit
6) you will be dropped into pivoted root filesystem, before systemd is execed
as pid one
7) /run/initramfs/ will contain a debug log, showing how everything was
mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower
overlay setup from them, moved to /root, and then pivot-root to /root done to
finally end up as /. Underlying layers are moved into /cow for your convenience.
8) At this point modifying zero-byte length files, that exist in the
lowest layer, but not the middle one, in certain ways, will results in
them to be corrupted, after / is remounted.
9) Corruption examples
(On both focal & eoan)
cat /etc/.pwd.lock
systemd-sysusers
cat /etc/.pwd.lock
mount -o remount /
cat /etc/.pwd.lock
overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000)
cat: /etc/.pwd.lock: Input/output error
(Only on eoan)
cat /etc/machine-id
systemd-machine-id-setup
cat /etc/machine-id
mount -o remount /
cat /etc/machine-id
overlayfs: invalid origin (etc/machine-id, ftype=8000, origin ftype=4000)
cat: /etc/machine-id: Input/output error
Lots of things break once machine-id and .pwd.lock are corrupted. I.e.
unable to dhcp, connect to dbus, add/remove/change users or groups,
etc.
We were unable to recreate the issue outside of booting things with
casper. Ie. statically on a regular host machine without pivot-root.
But hopefully booting to a quite state with nothing running is
sufficient to reproduce this.
Instead of booting with `bebroken init=/bin/bash` you can boot with
`bebroken systemd.mask=systemd-remount-fs.service` this will complete
the boot, with /etc/machine-id & .pwd.lock modified, meaning that
remount of / will cause IO errors on those files.
Currently, we are shipping two hacks in casper's 25adduser script to
"rm" the offending files, and create them again on the upper rw layer.
They then survive remount without i/o errors. However, we'd rather not
ship those hacks, and have kernel overlay fixed to work correctly with
multi-lower-dir and not corrupt files upon remounting /.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824407/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp