On Fri, Jun 28, 2024 at 01:55:08PM +0200, Martin Steigerwald wrote: > Hi! > > I am ending a migration from a ThinkPad T14 AMD Gen 1 to a ThinkPad T14 > AMD Gen 5. > > Last filesystem is a BCacheFS with some larger files that I use for testing > BCacheFS. rsync was directly pulling from the older laptop over 1 GBit > link through my local router. All other filesystems are BTRFS and there > have not been an issue with migrating about 1,8 TiB of data to three > BTRFS filesystems via rsync. > > Standard Debian Unstable Kernel as of today (on Devuan): > > Linux version 6.9.7-amd64 ([email protected]) > (x86_64-linux-gnu-gcc-13 (Debian 13.3.0-1) 13.3.0, > GNU ld (GNU Binutils for Debian) 2.42.50.20240625) > #1 SMP PREEMPT_DYNAMIC Debian 6.9.7-1 (2024-06-27) > > % bcachefs version > 1.9.1 > > SSD is 4 TB Samsung 990 Pro. BCacheFS is on LUKS encrypted LVM as the > BTRFS filesystems as well. > > I created BCacheFS as follows (this is from a subsequent mkfs.bcachefs. > I do not have initial output anymore as I already overwrote it with the > new successful attempt in my documentation, but other than UUIDs nothing > should have changed I bet, the parameters were identical - see below for > the successful attempt): > > % mkfs.bcachefs --data_checksum xxhash --metadata_checksum xxhash > --compression=lz4 /dev/nvme1/daten2 > [… identifiers deleted …] > Device index: 0 > Label: > Version: 1.7: mi_btree_bitmap > Version upgrade complete: 0.0: (unknown version) > Oldest version on disk: 1.7: mi_btree_bitmap > Created: […] > Sequence number: 0 > Time of last write: Thu Jan 1 01:00:00 1970 > Superblock size: 976 B/1.00 MiB > Clean: 0 > Devices: 1 > Sections: members_v1,members_v2 > Features: > Compat features: > > Options: > block_size: 512 B > btree_node_size: 256 KiB > errors: continue [ro] panic > metadata_replicas: 1 > data_replicas: 1 > metadata_replicas_required: 1 > data_replicas_required: 1 > encoded_extent_max: 64.0 KiB > metadata_checksum: none crc32c crc64 [xxhash] > data_checksum: none crc32c crc64 [xxhash] > compression: lz4 > background_compression: none > str_hash: crc32c crc64 [siphash] > metadata_target: none > foreground_target: none > background_target: none > promote_target: none > erasure_code: 0 > inodes_32bit: 1 > shard_inode_numbers: 1 > inodes_use_key_cache: 1 > gc_reserve_percent: 8 > gc_reserve_bytes: 0 B > root_reserve_percent: 0 > wide_macs: 0 > acl: 1 > usrquota: 0 > grpquota: 0 > prjquota: 0 > journal_flush_delay: 1000 > journal_flush_disabled: 0 > journal_reclaim_delay: 100 > journal_transaction_names: 1 > version_upgrade: [compatible] incompatible none > nocow: 0 > > members_v2 (size 160): > Device: 0 > Label: (none) > UUID: […] > Size: 300 GiB > read errors: 0 > write errors: 0 > checksum errors: 0 > seqread iops: 0 > seqwrite iops: 0 > randread iops: 0 > randwrite iops: 0 > Bucket size: 256 KiB > First bucket: 0 > Buckets: 1228800 > Last mount: (never) > Last superblock write: 0 > State: rw > Data allowed: journal,btree,user > Has data: (none) > Btree allocated bitmap blocksize: 1.00 B > Btree allocated bitmap: > 0000000000000000000000000000000000000000000000000000000000000000 > Durability: 1 > Discard: 0 > Freespace initialized: 0 > > > Directly after creating it I mounted it from /etc/fstab: > > /dev/nvme1/daten2 /daten2 bcachefs lazytime 0 0 > > Soon after the copying process started, rsync got stuck in "D" state. > It was within the first 500 MiB or so. Nothing in kernel log. I waited > a bit and then stopped rsync. One rsync process remained in "D" state > and thus did not go away. I tried another time and one the rsync > processes was immediately in "D" state. > > Thus I rebooted. Runit hung during reboot. Likely due to processes in > D state. I eventually switched up the laptop by pressing the power > button long enough. > > I did an fsck.bcachefs and got: > > % fsck.bcachefs /dev/nvme1/daten2 > fsck binary is version 1.9: disk_accounting_v2 but filesystem is 1.7: > mi_btree_bitmap and kernel is 1.7: mi_btree_bitmap, using kernel fsck > bcachefs (dm-5): mounting version 1.7: mi_btree_bitmap > opts=ro,metadata_checksum=xxhash,data_checksum=xxhash,compression=lz4,degraded,fsck,fix_errors=ask,read_only > bcachefs (dm-5): recovering from clean shutdown, journal seq 45 > bcachefs (dm-5): journal read done, replaying entries 45-45 > bcachefs (dm-5): alloc_read... done > bcachefs (dm-5): stripes_read... done > bcachefs (dm-5): snapshots_read... done > bcachefs (dm-5): check_allocations...key version number higher than recorded: > 73014444594 > 0: fix? (y,n, or Y,N for all errors of this type) y > key version number higher than recorded: 81604378807 > 73014444594: fix? > (y,n, or Y,N for all errors of this type) y > dev 0 has wrong free buckets: got 0, should be 1220580: fix? (y,n, or Y,N for > all errors of this type) y > dev 0 has wrong sb buckets: got 0, should be 13: fix? (y,n, or Y,N for all > errors of this type) y > dev 0 has wrong sb sectors: got 0, should be 6152: fix? (y,n, or Y,N for all > errors of this type) y > dev 0 has wrong sb fragmented: got 0, should be 504: fix? (y,n, or Y,N for > all errors of this type) y > dev 0 has wrong journal buckets: got 0, should be 8192: fix? (y,n, or Y,N for > all errors of this type) y > dev 0 has wrong journal sectors: got 0, should be 4194304: fix? (y,n, or Y,N > for all errors of this type) y > dev 0 has wrong btree buckets: got 0, should be 15: fix? (y,n, or Y,N for all > errors of this type) y > dev 0 has wrong btree sectors: got 0, should be 7680: fix? (y,n, or Y,N for > all errors of this type) y > fs has wrong hidden: got 0, should be 4200960: fix? (y,n, or Y,N for all > errors of this type) y > fs has wrong btree: got 0, should be 7680: fix? (y,n, or Y,N for all errors > of this type) y > fs has wrong nr_inodes: got 20, should be 22: fix? (y,n, or Y,N for all > errors of this type) y > fs has wrong btree: 1/1 [0]: got 0, should be 7680: fix? (y,n, or Y,N for all > errors of this type) y > done > bcachefs (dm-5): going read-write > bcachefs (dm-5): journal_replay... done > bcachefs (dm-5): check_alloc_info...y done > bcachefs (dm-5): check_lrus... done > bcachefs (dm-5): check_btree_backpointers... done > bcachefs (dm-5): check_backpointers_to_extents... done > bcachefs (dm-5): check_extents_to_backpointers... done > bcachefs (dm-5): check_alloc_to_lru_refs... done > bcachefs (dm-5): check_snapshot_trees... done > bcachefs (dm-5): check_snapshots... done > bcachefs (dm-5): check_subvols... done > bcachefs (dm-5): check_subvol_children... done > bcachefs (dm-5): delete_dead_snapshots... done > bcachefs (dm-5): check_inodes... done > bcachefs (dm-5): check_extents... done > bcachefs (dm-5): check_indirect_extents... done > bcachefs (dm-5): check_dirents... done > bcachefs (dm-5): check_xattrs... done > bcachefs (dm-5): check_root... done > bcachefs (dm-5): check_subvolume_structure... done > bcachefs (dm-5): check_directory_structure... done > bcachefs (dm-5): check_nlinks... done > bcachefs (dm-5): resume_logged_ops... done > bcachefs (dm-5): delete_dead_inodes... done > bcachefs (dm-5): shutdown complete, journal seq 47 > dm-5: errors fixed > > For a regular unclean shutdown I would not have expected any filesystem > errors. A subsequent call to "fsck.bcachefs" revealed no further errors. > > I mounted the filesystem again and tried another time with rsync and > it did not seem to get stuck as before. However I felt uncomfortable > with continuing with a filesystem that has had errors already. > Especially at BCacheFS is still marked experimental.
This should all be fixed in this branch: https://evilpiepirate.org/git/bcachefs.git/log/?h=bcachefs-for-6.9
