@Ike: Let me know if the above sounds plausible. I should also make the following notes, which may provide evidence counter to my theory of pre- existing corruption:
It is possible that mkfs.ext4 was ran outside of sudo (e.g. directly in a root shell), and would therefore not be logged in /var/log/auth.log. It's also possible that testing was running in an ssh shell, so the commands I gleaned from the console may not be the full picture. While it is true that the first mount of /dev/sda2 after applying Ted's patch reported "mounting fs with errors", subsequent mounts do not have that message, and yet there are "bad block bitmap checksum" messages that follow: Jul 9 08:19:42 d05-4 kernel: [ 139.572607] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null) Jul 9 09:17:09 d05-4 kernel: [ 3586.211348] EXT4-fs error (device sda2): ext4_validate_block_bitmap:383: comm stress-ng: bg 4705: bad block bitmap checksum Jul 9 09:37:34 d05-4 kernel: [ 4810.952360] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null) Jul 9 10:34:58 d05-4 kernel: [ 8254.776992] EXT4-fs error (device sda2): ext4_validate_block_bitmap:383: comm stress-ng: bg 4193: bad block bitmap checksum Does that mean the filesystem was somehow no longer corrupted at that time (e.g. fsck'd, or re-mkfs'd), or is it possible that the fs-has- errors flag was just cleared while corruption persisted? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1780137 Title: [Regression] EXT4-fs error (device sda1): ext4_validate_inode_bitmap:99: comm stress-ng: Corrupt inode bitmap Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Triaged Bug description: We're seeing a very reproducible regression in the bionic kernel triggered by the stress-ng chdir test performed by the Ubuntu certification suite. We see this on both the HiSilicon D05 arm64 server and the HiSilicon D06 arm64 server. We have been unable to reproduce on other servers so far. [Test Case] $ sudo apt-add-repository -y ppa:hardware-certification/public $ sudo apt install -y canonical-certification-server $ sudo mkfs.ext4 /dev/sda1 (Obviously, this should not be your root disk!!) $ sudo /usr/lib/plainbox-provider-checkbox/bin/disk_stress_ng sda --base-time 240 --really-run This test runs a series of stress-ng tests against /dev/sda, and fails on the "chdir" test. To speed up reproduction, reduce the test list to just "chdir" in the disk_stress_ng script. Attempts to reproduce this directly with stress-ng have failed - presumably because of other environment setup that this script performs (e.g. setting aio-max-nr to 524288). Our reproduction test is to use a non-root disk because it can lead to corruption, and mkfs.ext4'ing the partition just before running the test, to get to a pristine fs state. I bisected this down to the following commit: commit 555bc9b1421f10d94a1192c7eea4a59faca3e711 Author: Theodore Ts'o <ty...@mit.edu> Date: Mon Feb 19 14:16:47 2018 -0500 ext4: don't update checksum of new initialized bitmaps BugLink: http://bugs.launchpad.net/bugs/1773233 commit 044e6e3d74a3d7103a0c8a9305dfd94d64000660 upstream. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1780137/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp