Hello Ted,

I have tried on 4 different computers and it is reproducible on all of them
on unclean shutdown.

NUC11TNKi7, NUC5i5RYB, DA-1100 and Asrock SOM-P104J SBC

*NUC11TNKi7:*
Manufacturer - Intel Corporation
Processor - 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
<https://dashboards.onestoptoronto.com/trixie/fact/processor0/11th%2BGen%2BIntel%2528R%2529%2BCore%2528TM%2529%2Bi7-1165G7%2B%2540%2B2.80GHz>
Disk - nvme
Type- HDD
Disk model - TS128GMTE110S

*NUC5i5RYB:*
Manufacturer - Intel Corporation
Processor - Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
<https://dashboards.onestoptoronto.com/trixie/fact/processor0/Intel%2528R%2529%2BCore%2528TM%2529%2Bi5-5250U%2BCPU%2B%2540%2B1.60GHz>
Disk type - HDD
Disk model - TS64GMTS800

*SOM-P104J SBC:*
Manufacturer - ASRock Industrial
Processor - Intel(R) N97
Disk - nvme
Disk model - TS128GMTE110S

*DA-1100:*
Manufacturer - CINCOZE
Processor - Intel(R) Pentium(R) CPU N4200 @ 1.10GHz
<https://dashboards.onestoptoronto.com/fact/processor0/Intel%2528R%2529%2BPentium%2528R%2529%2BCPU%2BN4200%2B%2540%2B1.10GHz>
Disk model - TS64GMSA372I


It does not happen under Bookworm. I even tried upgrading kernel to 6.9.7
from bookworm-backports, e2sfprogs-1.47.0-2 on the same 4 computers and the
computer boot normally after every unclean shutdown.

*How to reproduce it:*
Install stock trixie

   1. apt-get install watchdog sway greetd chromium
   2. edit /etc/watchdog.conf and uncomment "watchdog-device         =
   /dev/watchdog" line. Save the file
   3. mkdir -p ~/.conf/sway
   4. cp /etc/sway/config ~/.config/sway/
   5. edit /etc/greetd/config.toml and change the following lines
   command = "/usr/bin/sway" to launch sway on startup
   user = "(yourUserName)"
   6. systemctl enable watchdog
   7. systemctl enable greetd
   8. reboot
   9. when computer starts in sway, press [cmd] + [Enter} to launch terminal
   10. chromium --ozone-platform-hint=wayland
   11. As root - echo c > /proc/sysrq-trigger
   12. The computer boots in busybox displaying the message I sent earlier

I have tried other graphical environments such as GNOME and the issue is
reproducible on it as well when triggered unclean shutdown.

*NUC5i5RYB:*
dumpe2fs -h /dev/sda2
dumpe2fs 1.47.1 (20-May-2024)
Filesystem volume name:   <none>
Last mounted on:          /
Filesystem UUID:          eb3f94f2-7f1e-4405-9239-9738ea84499d
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index
orphan_file filetype needs_recovery extent 64bit flex_bg metadata_csum_seed
sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
orphan_present
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              3645440
Block count:              14572800
Reserved block count:     728640
Overhead clusters:        307682
Free blocks:              13653939
Free inodes:              3596069
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Wed Nov  6 16:29:19 2024
Last mount time:          Thu Nov  7 14:23:48 2024
Last write time:          Thu Nov  7 14:23:47 2024
Mount count:              2
Maximum mount count:      -1
Last checked:             Wed Nov  6 16:48:05 2024
Check interval:           0 (<none>)
Lifetime writes:          5583 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      073ed4d9-bf6a-433e-ab6f-5c16c26b2c21
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0x193901e2
Checksum seed:            0xcc5a740b
Orphan file inode:        12
Journal features:         journal_incompat_revoke journal_64bit
journal_checksum_v3
Total journal size:       256M
Total journal blocks:     65536
Max transaction length:   65536
Fast commit length:       0
Journal sequence:         0x00004e09
Journal start:            18537
Journal checksum type:    crc32c
Journal checksum:         0xee80ca9e

*NUC11TNKi7:*
dumpe2fs 1.47.1 (20-May-2024)
Filesystem volume name:   <none>
Last mounted on:          /
Filesystem UUID:          aa673884-2798-4c16-8412-e6e6d6634227
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index
orphan_file filetype needs_recovery extent 64bit flex_bg metadata_csum_seed
sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
orphan_present
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              7356416
Block count:              29394944
Reserved block count:     1469747
Overhead clusters:        608244
Free blocks:              28166565
Free inodes:              7306375
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Thu Nov  7 14:23:49 2024
Last mount time:          Thu Nov  7 15:12:06 2024
Last write time:          Thu Nov  7 15:12:05 2024
Mount count:              1
Maximum mount count:      -1
Last checked:             Thu Nov  7 15:12:05 2024
Check interval:           0 (<none>)
Lifetime writes:          6552 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      d0fc1889-cf0a-45ee-8988-ed1167f88490
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0xd0b259cc
Checksum seed:            0x4edf1204
Orphan file inode:        12
Journal features:         journal_incompat_revoke journal_64bit
journal_checksum_v3
Total journal size:       512M
Total journal blocks:     131072
Max transaction length:   131072
Fast commit length:       0
Journal sequence:         0x00004f54
Journal start:            86508
Journal checksum type:    crc32c
Journal checksum:         0x6cd94711

*Under Trixie:*
I have downgraded linux-image to the following:
linux-image-6.9.9
linux-image-6.8.12
linux-image-6.6.15

Each time the computer boots in busybox on unclean shutdown.

I am unable to find the 1.47.1~rc-1 version so can't confirm. I have tested
1.47.1~rc1-1, 1.47.1~rc1-2, 1.47.1~rc1-3, 1.47.1-1 and the issue is
reproducible on all these versions.


Please let me know if you need more info.

Thank you!

On Wed, Nov 6, 2024 at 5:17 PM Theodore Ts'o <ty...@mit.edu> wrote:

> On Wed, Nov 06, 2024 at 01:23:01PM -0500, Anees Ahmad wrote:
> > Hello,
> >
> > The issue described initially does not happen when both packages
> > *e2fsprogs_1.47.0-2.4_amd64.deb*
> > <
> https://snapshot.debian.org/archive/debian/20240314T094714Z/pool/main/e/e2fsprogs/e2fsprogs_1.47.0-2.4_amd64.deb
> >
> >  and *libext2fs2t64_1.47.0-2.4_amd64.deb*
> > <
> https://snapshot.debian.org/archive/debian/20240314T094714Z/pool/main/e/e2fsprogs/libext2fs2t64_1.47.0-2.4_amd64.deb
> >
> > were downgraded to 1.47.0-2.4 on the same computer.
> > The bug appeared in 1.47.1~rc1-1.
>
>
> How reproducible is the problem with the 1.47.1~rc-1?
>
> And can you describe the hardware where this is happening --- what
> kind of storage device, etc.?  And can you reroduce it on some other
> hardware?  And can you send the outut of dumpe2fs -h /dev/sda2?
>
> This is not a problem that I've seen on any of my hardware, or on my
> regression test suites, so without a clean reproducer it's going to be
> very hard to work the problem.
>
> Also, note that the error messages which you reported:
>
> /dev/sda2: recovering journal
> /dev/sda2: Clearing orphaned inode 3408176 (uid=1000, gid=1000,
> mode-0100600, size=64)
> /dev/sda2: clean, 150916/3645440 files, 1460364/14572800 blocks
> [    3.7641111 EXT4-fs error (device sda2): ext4_orphan_get:1421: comm
> mount: bad orphan inode 3408176
> [    3.7641371 ext4_test_bit (bit-303, block-13631504) = 0
> EXT4-fs error (device sda2): ext4 orphan_get:1121: comm mount: bad orphan
> inode 3408176
> ext1_test_bit(bit-303, block-13631504) = 0
> [  3.7646551 EXT4-fs error (device sda2):
> ext4_mark_recovery_complete:6229: comm mount: Orphan file not empty on
> read-only fs. 3.764746) EXT4-fs (sda2): mount failed
> mount: mounting /dev/sda2 on /root failed: Structure needs cleaning
> Failed to mount /dev/sda2 as root file system.
> EXT4-fs error (device sda2): ext4_mark_recovery_complete:6229: comm mount:
> Orphan file not empty on read-only fs. EXT4-fs (sda2): mount failed
> BusyBox v1.37.0 (Debian 1:1.37.0-4) built-in shell (ash)
>
>
> Are emitted by the kernel, and happen before any program in e2fsprogs
> has a chance to run.  Hence, it is highly unlikely that the version of
> e2fsprogs would make a difference in this message.
>
> The specific error message indicates that there is an inode on the
> orphan list (in this case, inode #3408176) which is marked as unused
> (i.e., free) in the block alloation bitmap.  This is a file system
> corruption problem, and is generally indicative of a kernel bug, or
> some kind of hardware problem/failure/bug.
>
> The reason why I'm a bit dubious that it is a kernel bug is (a) no one
> else has reported anything like this, and (b) I am regularly running
> kernel regression test which tests the unclean shutdown code path, and
> I haven't seen a failure in this area of the kernel code for years.
>
> Cheers,
>
>                                                         - Ted
>

Reply via email to