On Sun, Sep 01, 2024 at 02:01:25PM GMT, Gerhard Wiesinger wrote:
> Hello,
> 
> I'm having some Fedora Linux VMs (actual versions, latest updates) in a
> virtual test infrastructure on Virtualbox. There I run different VMs with
> different filesystems (ext4, xfs, zfs, bcachefs and btrfs).
> 
> I had a hardware problem on the underlying hardware where around 1000 4k
> blocks could not be read anymore. I migrated with ddrescure the whole disk
> which worked well.
> 
> Of course I was expecting some data loss in the VMs but wanted to get them
> in a consistent state.
> 
> The following file systems got very easy in a consistent state with the
> corresponding repair/scrub tools of the filesystems:
> - ext4
> - xfs
> - zfs
> 
> Unfortunately 2 filesystem can't get into a state, where the filesystem
> repair tools report "everything fine" (of course with some loss data, but
> that's fine):
> - btrfs
> - bcachefs
> 
> commands run with bcachefs (git version):
> git log -n1 | head -n1
> commit 1e058db4b603f8992b781b4654b48221dd04407a
> ./bcachefs version
> 1.12.0
> 
> But bcachefs never got into a consistent state, also with newer versions.
> Also check with older versions (1.7.0) run for a long time.
> 
> To reproduce the problem I created a new filesystem and copied some files
> there:
> mkfs.bcachefs -f /dev/sdb
> time cp -Rap /usr /mnt
> 
> Afterwards I created a (quick&dirty) script "corrupt_device.sh" to corrupt
> the device in the same manner as the original failure (1000 4k blocks will
> be randomly overwritten).
> Script: see below
> 
> ~/corrupt_device.sh
> ./bcachefs fsck -pf /dev/sdb
> ./bcachefs fsck -pfR /dev/sdb
> 
> Result: It can be reproduced, that bcachefs can't be brought into a
> consistent state even after several runs of the repair.
> 
> You can also try to reproduce it and create a testcase out of it.
> 
> Any ideas how to repair and what can be done to get it into a consistent
> state?

If you've got a filesystem you want data off of - send me a metadata
dump (join the IRC channel, send it via magic wormhole) and I'll debug.

We still haven't comprehensively torture tested all the repair paths
(which is probably the biggest reason it's still marked as
experimental); all the repair paths are there, but there's still bugs to
shake out.

Thanks for the test - I'll try to make use of it when I'm working in
that area again.

Reply via email to