** Description changed:
- Apr 11 18:32:52 ubuntu-server kernel: SQUASHFS error: squashfs_read_data
failed to read block 0x6ff3660032757063
- Apr 11 18:32:52 ubuntu-server kernel: SQUASHFS error: Unable to read metadata
cache entry [6ff3660032757063]
- Apr 11 18:32:55 ubuntu-server kernel: SQUASHFS error: squashfs_read_data
failed to read block 0x6261746d79732e
- Apr 11 18:32:55 ubuntu-server kernel: SQUASHFS error: Unable to read metadata
cache entry [6261746d79732e]
- Apr 11 18:33:05 ubuntu-server kernel: SQUASHFS error: squashfs_read_data
failed to read block 0x6ff366df00333a37
- Apr 11 18:33:05 ubuntu-server kernel: SQUASHFS error: Unable to read metadata
cache entry [6ff366df00333a37]
+ 1) Download focal subiquity daily image
+ 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI)
+ 3) Before --- insert the following options
+ bebroken debug init=/bin/bash
+ 4) Continue boot (Enter in BIOS, ctrl+x in UEFI)
- Happens when booting e.g. subiquity disco image. v5.0.0-8-generic kernel
+ 5) you will be dropped into pivoted root filesystem, before systemd is execed
as pid one
+ 6) /run/initramfs/ will contain a debug log, showing how everything was
mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower
overlay setup from them, moved to /root, and then pivot-root to /root done to
finally end up as /. Underlying layers are moved into /cow for your convenience.
+
+ 7) At this point modifying zero-byte length files, that exist in the
+ lowest layer, but not the middle one, in certain ways, will results in
+ them to be corrupted, after / is remounted.
+
+ 8) Exhibit A:
+ $ cat /etc/machine-id
+ (no output)
+ $ systemd-machine-id-setup
+ $ cat /etc/machine-id
+ (some machine id)
+ $ mount -o remount /
+ $ cat /etc/machine-id
+ I/O error
+ with overlay errors in dmesg
+
+ Similarly one can reproduce this with /etc/.pwd.lock & executing
+ systemd-sysusers.
+
+ systemd-machine-id-setup is probably the easiest to trace. It does a
+ simply open, truncate, lseek, write. On boot, actuall remount is done by
+ the starting a unit which calls /lib/systemd/systemd-remount-fs
+
+ Lots of things break once machine-id and .pwd.lock are corrupted. I.e.
+ unable to dhcp, connect to dbus, add/remove/change users or groups, etc.
+
+ We were unable to recreate the issue outside of booting things with
+ casper. Ie. statically on a regular host machine without pivot-root. But
+ hopefully booting to a quite state with nothing running is sufficient to
+ reproduce this.
+
+ Instead of booting with `bebroken init=/bin/bash` you can boot with
+ `bebroken systemd.mask=systemd-remount-fs.service` this will complete
+ the boot, with /etc/machine-id & .pwd.lock modified, meaning that
+ remount of / will cause IO errors on those files.
+
+ Currently, we are shipping two hacks in casper to "rm" the offending
+ files, and create them again on the upper rw layer. They then survive
+ remount without i/o errors. However, we'd rather not ship those hacks,
+ and have kernel overlay fixed to work correctly with multi-lower-dir and
+ not corrupt files upon remounting /.
** Changed in: linux (Ubuntu)
Status: Incomplete => New
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1824407
Title:
why does booting any livefs squashfs has kernel complaining about
unable to read metadata something rather
Status in linux package in Ubuntu:
New
Bug description:
1) Download focal subiquity daily image
2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI)
3) Before --- insert the following options
bebroken debug init=/bin/bash
4) Continue boot (Enter in BIOS, ctrl+x in UEFI)
5) you will be dropped into pivoted root filesystem, before systemd is execed
as pid one
6) /run/initramfs/ will contain a debug log, showing how everything was
mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower
overlay setup from them, moved to /root, and then pivot-root to /root done to
finally end up as /. Underlying layers are moved into /cow for your convenience.
7) At this point modifying zero-byte length files, that exist in the
lowest layer, but not the middle one, in certain ways, will results in
them to be corrupted, after / is remounted.
8) Exhibit A:
$ cat /etc/machine-id
(no output)
$ systemd-machine-id-setup
$ cat /etc/machine-id
(some machine id)
$ mount -o remount /
$ cat /etc/machine-id
I/O error
with overlay errors in dmesg
Similarly one can reproduce this with /etc/.pwd.lock & executing
systemd-sysusers.
systemd-machine-id-setup is probably the easiest to trace. It does a
simply open, truncate, lseek, write. On boot, actuall remount is done
by the starting a unit which calls /lib/systemd/systemd-remount-fs
Lots of things break once machine-id and .pwd.lock are corrupted. I.e.
unable to dhcp, connect to dbus, add/remove/change users or groups,
etc.
We were unable to recreate the issue outside of booting things with
casper. Ie. statically on a regular host machine without pivot-root.
But hopefully booting to a quite state with nothing running is
sufficient to reproduce this.
Instead of booting with `bebroken init=/bin/bash` you can boot with
`bebroken systemd.mask=systemd-remount-fs.service` this will complete
the boot, with /etc/machine-id & .pwd.lock modified, meaning that
remount of / will cause IO errors on those files.
Currently, we are shipping two hacks in casper to "rm" the offending
files, and create them again on the upper rw layer. They then survive
remount without i/o errors. However, we'd rather not ship those hacks,
and have kernel overlay fixed to work correctly with multi-lower-dir
and not corrupt files upon remounting /.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824407/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp