On 2021/1/22 上午7:55, chainofflowers wrote:
Hi Qu,

it happened again. This time on my /home partition.
I rebooted from an external disk and ran btrfs check without first going
through btrfs scrub, and this is the output, no errors:

------------------------------------------
[manjaro oc]# btrfs check /dev/mapper/OMO
Opening filesystem to check…
Checking filesystem on /dev/mapper/OMO
UUID: 9362ac9d-c280-451d-9c8a-c09798e1c887
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 137523740672 bytes used, no error found
total csum bytes: 113842816
total tree bytes: 1740537856
total fs tree bytes: 1444249600
total extent tree bytes: 143835136
btree space waste bytes: 325744995
file data blocks allocated: 210346024960
  referenced 172314374144
------------------------------------------

Then, I rebooted from my internal disk, everything went well. I ran
btrfs scrub and also got no errors.

So far so good.


I have dumped the messages from journalctl, and the debug ones related
to btrfs were only the ones from btrfs_trim_block_group - so, the issue
is related to free space extents I guess?

Unfortunately, without the crash output, it can be anything.


I have attached the logs.
You can see that the last line:

------------------------------------------
Jan 21 23:57:25 <***> kernel: btrfs_trim_block_group: enter bg
start=26864517120 start=26864517120 end=27938258944 minlen=512
------------------------------------------

does not have a second matching line with "ret=0", because the kernel
stopped storing messages in the log. So, I guess the issue occurred
while btrfs_trim_block_group was working on 26864517120..27938258944.

If you have some machine running 24x7, like a RPi, I would recommend to
setup netconsole to catch the full dying message to be extra safe.

Or setup kdump, to catch the dying message.

Personally speaking, netconsole would be much easier to setup though.

Currently with truncated journal it's really hard to say.

Thanks,
Qu

Unfortunately I did not dump the output of dmesg directly in that
moment, so all I could get is what was available in the journal after
the reboot.

In the log you can also see that some time before BTRFS detected that
the space cache for dm-3 needed to be rebuilt:
------------------------------------------
Jan 21 19:29:17 <***> kernel: BTRFS warning (device dm-3): block group
82699091968 has wrong amount of free space
Jan 21 19:29:17 <***> kernel: BTRFS warning (device dm-3): failed to
load free space cache for block group 82699091968, rebuilding it now
------------------------------------------

Any hint about what I could do now?

Thanks! :-)



(c)

Reply via email to