Dear btrfs experts,
On my desktop PC, I have 1 btrfs partition on a single SSD device with 3
subvolumes (/, /home, /var). Whenever I boot my PC, after logging in to GNOME,
the btrfs partition is being remounted as ro due to errors. This is the dmesg
output at that time:
> [ 616.155392] BTRFS error (device dm-0): parent transid verify failed on
> 1144783093760 wanted 2734307 found 2734305
> [ 616.155650] BTRFS error (device dm-0): parent transid verify failed on
> 1144783093760 wanted 2734307 found 2734305
> [ 616.155657] BTRFS: error (device dm-0) in __btrfs_free_extent:3054:
> errno=-5 IO failure
> [ 616.155662] BTRFS info (device dm-0): forced readonly
> [ 616.155665] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2124:
> errno=-5 IO failure
The issue started to happen today after login. Yesterday everything works fine.
I suggest something went wrong on last shutdown but I don't know for sure
because as this disk also has my logs, I don't see any errors on that shutdown
in my logs.
System info:
* Fedora 33 x86_64
* kernel: Linux 5.11.10-200.fc33.x86_64 #1 SMP
* btrfs-progs v5.10 (5.10-1.fc33.x86_64)
* Samsung 840 series SSD (SMART data looks fine)
What happens:
1. I boot my PC including mounting the root partition
2. Everything works fine.
3. I can log in as root or my user on tty and do basic stuff there and it works
4. I log in to my user account (gdm, GNOME shell). Alternatively, running e.g.
`dnf history info last` also triggers the dmesg output shown above.
5. Many applications don't work any more. The common root cause seems to be
that the filesystem is remounted readonly due to the errors noted above.
Basic info: see attached file "dmesg info.txt" (generated from Fedora live
system)
What I've tried so far:
1. I ran `btrfs scrub` from live system. This errors out:
> [root@localhost-live liveuser]# btrfs scrub start -B /mnt
> ERROR: scrubbing /mnt failed for device id 1: ret=-1, errno=5 (Input/output
> error)
> scrub canceled for 1a149bda-057d-4775-ba66-1bf259fce9a5
> Scrub started: Sun Mar 28 07:20:07 2021
> Status: aborted
> Duration: 0:13:00
> Total to scrub: 269.06GiB
> Rate: 252.24MiB/s
> Error summary: no errors found
At the same time, in `dmesg`, I see this:
> [ 7878.612534] BTRFS error (device dm-2): parent transid verify failed on
> 1144783093760 wanted 2734307 found 2734305
> [ 7878.637673] BTRFS error (device dm-2): parent transid verify failed on
> 1144783093760 wanted 2734307 found 2734305
> [ 7878.639459] BTRFS info (device dm-2): scrub: not finished on devid 1 with
> status: -5
2. I ran `btrfs check` (without repair) from live system. This also shows
errors (see attached file "btrfs check.txt".
Side note: There is also a rare chance that this issue is triggered by a
software update I did yesterday. This includes an update of systemd-246.10 to
systemd-246.13 and kernel-5.11.8 to kernel-5.11.10.
Changes in systemd: https://src.fedoraproject.org/rpms/systemd/commits/f33
Changes in kernel: https://src.fedoraproject.org/rpms/kernel/commits/f33
Since this update has also been deployed to many other users (I am using stable
channel) and I have not seen any related issues in Fedora's bugzilla and
discourse, so I doubt this is related.
What shall I do now? Do I need any of the invasive methods (`btrfs rescue` or
`btrfs check --repair`) and if yes, which method do I choose?
Kind regards,
Chris
[root@localhost-live liveuser]# btrfs check
/dev/mapper/luks-ff6e174f-4cd3-42a7-8ee5-47005dd077dc
Opening filesystem to check...
ERROR: /dev/mapper/luks-ff6e174f-4cd3-42a7-8ee5-47005dd077dc is currently
mounted, use --force if you really intend to check the filesystem
[root@localhost-live liveuser]# btrfs check
/dev/mapper/luks-ff6e174f-4cd3-42a7-8ee5-47005dd077dc
Opening filesystem to check...
Checking filesystem on /dev/mapper/luks-ff6e174f-4cd3-42a7-8ee5-47005dd077dc
UUID: 1a149bda-057d-4775-ba66-1bf259fce9a5
[1/7] checking root items
parent transid verify failed on 1144783093760 wanted 2734307 found 2734305
parent transid verify failed on 1144783093760 wanted 2734307 found 2734305
parent transid verify failed on 1144783093760 wanted 2734307 found 2734305
Ignoring transid failure
ERROR: child eb corrupted: parent bytenr=1144881201152 item=14 parent level=1
child level=2
ERROR: failed to repair root items: Input/output error
[2/7] checking extents
parent transid verify failed on 1144783093760 wanted 2734307 found 2734305
Ignoring transid failure
bad block 1144783093760
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
parent transid verify failed on 1144783093760 wanted 2734307 found 2734305
Ignoring transid failure
ERROR: child eb corrupted: parent bytenr=1144881201152 item=14 parent level=1
child level=2
cache appears valid but isn't 1062040764416
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
parent transid verify failed on 1144783093760 wanted 2734307 found 2734305
Ignoring transid failure
ERROR: child eb corrupted: parent bytenr=1144881201152 item=14 parent level=1
child level=2
Error going to next leaf -5
csum exists for 1062926516224-1062935089152 but there is no extent record
ERROR: errors found in csum tree
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
ERROR: transid errors in file system
found 11738640384 bytes used, error(s) found
total csum bytes: 0
total tree bytes: 3719168
total fs tree bytes: 0
total extent tree bytes: 3522560
btree space waste bytes: 1056895
file data blocks allocated: 69992448
referenced 69992448
[root@localhost-live liveuser]# btrfs --version
btrfs-progs v5.7
[root@localhost-live liveuser]# btrfs fi show
Label: 'fedora_chstpc-2' uuid: 1a149bda-057d-4775-ba66-1bf259fce9a5
Total devices 1 FS bytes used 230.46GiB
devid 1 size 300.00GiB used 269.06GiB path
/dev/mapper/luks-ff6e174f-4cd3-42a7-8ee5-47005dd077dc
[root@localhost-live liveuser]# btrfs fi df /mnt
Data, single: total=263.00GiB, used=228.47GiB
System, DUP: total=32.00MiB, used=48.00KiB
Metadata, DUP: total=3.00GiB, used=1.99GiB
GlobalReserve, single: total=397.25MiB, used=0.00B