Hello,
I'm having some Fedora Linux VMs (actual versions, latest updates) in a
virtual test infrastructure on Virtualbox. There I run different VMs
with different filesystems (ext4, xfs, zfs, bcachefs and btrfs).
I had a hardware problem on the underlying hardware where around 1000 4k
blocks could not be read anymore. I migrated with ddrescure the whole
disk which worked well.
Of course I was expecting some data loss in the VMs but wanted to get
them in a consistent state.
The following file systems got very easy in a consistent state with the
corresponding repair/scrub tools of the filesystems:
- ext4
- xfs
- zfs
Unfortunately 2 filesystem can't get into a state, where the filesystem
repair tools report "everything fine" (of course with some loss data,
but that's fine):
- btrfs
- bcachefs
commands run with bcachefs (git version):
git log -n1 | head -n1
commit 1e058db4b603f8992b781b4654b48221dd04407a
./bcachefs version
1.12.0
But bcachefs never got into a consistent state, also with newer
versions. Also check with older versions (1.7.0) run for a long time.
To reproduce the problem I created a new filesystem and copied some
files there:
mkfs.bcachefs -f /dev/sdb
time cp -Rap /usr /mnt
Afterwards I created a (quick&dirty) script "corrupt_device.sh" to
corrupt the device in the same manner as the original failure (1000 4k
blocks will be randomly overwritten).
Script: see below
~/corrupt_device.sh
./bcachefs fsck -pf /dev/sdb
./bcachefs fsck -pfR /dev/sdb
Result: It can be reproduced, that bcachefs can't be brought into a
consistent state even after several runs of the repair.
You can also try to reproduce it and create a testcase out of it.
Any ideas how to repair and what can be done to get it into a consistent
state?
Thnx.
Ciao,
Gerhard
Script corrupt_device.sh:
#!/usr/bin/env bash
RANDOM_DEVICE=/dev/urandom
OUTPUT_DEVICE=/dev/sdb
COUNT=1000
BLOCK_SIZE=4096
MAX_BLOCK_SIZE=$(blockdev --getsize64 ${OUTPUT_DEVICE})
echo "# Configured maximum size=${MAX_BLOCK_SIZE}"
MAX_BLOCK_NUMBER=$((MAX_BLOCK_SIZE/BLOCK_SIZE))
echo "# Maximum block number=${MAX_BLOCK_NUMBER}"
for ((BLOCK_NUMBER=1; BLOCK_NUMBER<=${COUNT}; BLOCK_NUMBER++ )) do
BLOCK=`shuf --input-range=0-${MAX_BLOCK_NUMBER} --head-count=1`
dd if=${RANDOM_DEVICE} of=${OUTPUT_DEVICE} bs=${BLOCK_SIZE}
seek=${BLOCK} count=1 > /dev/null 2>&1
done