Hello,
I'm running my normal workstation with git kernels from
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-testing.git
and just got the second file system corruption in three weeks. I do not
have issues with stable kernels, and just want to give you a heads up
that there might be something seriously broken in current development
kernels.
The first corruption was with a kernel based on 4.18.0-rc1
(wt-2018-06-20) and the second one today based on 4.18.0-rc4
(wt-2018-07-09).
The first corruption definitely destroyed data, the second one has not
been looked at all, yet.
After the reinstall I did run some scrubs, the last working one one week
ago.
Of course this could be unrelated to the development kernels or even
btrfs, but two corruptions within weeks after years without problems is
very suspect.
And since btrfs also allowed to read corrupted data (with a stable
ubuntu kernel, see below for more details) it looks like this is indeed
an issue in btrfs, correct?
A btrfs subvolume is used as the rootfs on a "Samsung SSD 850 EVO mSATA
1TB" and I'm running Gentoo ~amd64 on a Thinkpad W530. Discard is
enabled as mount option and there were roughly 5 other subvolumes.
I'm currently backing up the full btrfs partition after the second
corruption which announced itself with the following log entries:
[ 979.223767] BTRFS critical (device sdc2): corrupt leaf: root=2
block=1029783552 slot=1, unexpected item end, have 16161 expect 16250
[ 979.223808] BTRFS: error (device sdc2) in __btrfs_cow_block:1080:
errno=-5 IO failure
[ 979.223810] BTRFS info (device sdc2): forced readonly
[ 979.224599] BTRFS warning (device sdc2): Skipping commit of aborted
transaction.
[ 979.224603] BTRFS: error (device sdc2) in cleanup_transaction:1847:
errno=-5 IO failure
I'll restore the system from a backup - and stick to stable kernels for
now - after that, but if needed I can of course also restore the
partition backup to another disk for testing.
Here what I can say from the first crash:
On Jul 4th I discovered severe file system corruptions and when booting
with init=/bin/bash even tools like parted failed with some report about
invalid ELF headers for some library. I started an Ubuntu 17.10 install
on another physical disk and copied some data from the damaged btrfs
volume to the Ubuntu disk. And while I COULD copy the files quite many
of the interesting ones were broken:
e.g. the git tree I rescued from the broken btrfs disk is unusable. The
broken files I found all look about the correct size but contain only 0x01:
$ hexdump -C .git/objects/9d/732f6506e4cecd6d2b50c5008f9d1255198c1e
00000000 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
|................|
*
00000e26
After copying the files I tried a "btrfs check --repair" which was
finding countless errors and I aborted after I got more than 3 million
lines output. After the abort the complete home dir and everything
beneath it was simple gone. I gave up on the install and set the system
up from scratch, starting with formating the damaged partition new.
And exported the root subvolume with btrfs send to a fil.
The full output from the repair attempt can be downloaded here:
https://www.awhome.eu/index.php/s/6jXtBTEeyA2ns3d
Kind regards,
Alexander
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html