On Tue, Oct 15, 2024 at 09:51:16PM -0700, Carl E. Thompson wrote: > Hi, > > I believe there is another newer version downgrade bug in bcachefs > (tested versions: 6.9.4 <--> 6.11.3). > > My laptop runs kernel 6.9.4 normally with 4 bcachefs filesystems > on LVM2 logical volumes mounted including the root filesystem. I > needed to test something under 6.11 so I booted kernel 6.11.3 and > used the system normally from the console (bcachefs worked fine > under 6.11.3). After attempting to boot back into 6.9.4 my laptop > no longer starts and hangs when trying to mount and manipulate > the root filesystem. The kernel log shows kernel traces due to > hung copygc tasks (see dmesg output below). This happens every > time I try to start 6.9.4 now. The kernel log reveals that the > bcachefs filesystem seems to complete the version downgrade and > initial mount successfully but it starts hanging as soon as the > filesystem is used. Booting back into the 6.11.3 kernel causes > the filesystems to work again but I can't run 6.11 on my laptop > normally because 6.11 (and 6.10) have amdgpu issues that cause > irrecoverable graphical desktop lockups. So right now I can > either choose to boot with filesystem s that don't work or with > periodic hard graphical desktop crashes neither of which is > ideal.
Yeah, it looks like 6.9 isn't running the recovery passess specified in the superblock downgrade section, meaning we start running without correct accounting counters - 6.10 works, though. 6.10 is a LTS release and 6.9 is not - is 6.10 an option? > > On my laptop and some of my other computers I boot multiple Linux > distributions which usually run different kernels and mount the same > filesystems on all of them (except root). So I do need to be able to switch > back and forth between kernels as needed on all of my systems and these types > of issues give me some pause. I will disable bcachefs use on my dev systems > and servers for now until I am more confident that there is a solid testing > plan in place to make sure there can be no more of these kind of issues in > the future when booting multiple kernels. I will keep bcachefs on my laptop > for testing. A fix for my laptop isn't urgent for me personally as I can > recreate the filesystems under 6.9.4 and restore from backups. Of course > others people might need a fix more quickly. Next time I need to boot a > different kernel I'll make sure to create LVM snapshots of the devices first > to which I can revert if needed. > > Thanks, > Carl > > show-super from one affected filesystem: > --- > [clip carl]# bcachefs show-super /dev/clip/root-alpine > Device: (unknown device) > External UUID: > c992a5de-c9b3-4fd1-82ed-4d2f66bc11cb > Internal UUID: > 43b4fe97-f5a4-48b3-8d99-3a3dda25211a > Magic number: > c68573f6-66ce-90a9-d96a-60cf803df7ef > Device index: 0 > Label: (none) > Version: 1.12: (unknown version) > Version upgrade complete: 1.12: (unknown version) > Oldest version on disk: 1.4: member_seq > Created: Fri Mar 22 19:19:01 2024 > Sequence number: 249 > Time of last write: Tue Oct 15 20:23:34 2024 > Superblock size: 4.45 KiB/1.00 MiB > Clean: 0 > Devices: 1 > Sections: > members_v1,replicas_v0,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade > Features: > lz4,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes > Compat features: > alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done > > Options: > block_size: 4.00 KiB > btree_node_size: 256 KiB > errors: continue [fix_safe] panic ro > metadata_replicas: 1 > data_replicas: 1 > metadata_replicas_required: 1 > data_replicas_required: 1 > encoded_extent_max: 64.0 KiB > metadata_checksum: none [crc32c] crc64 xxhash > data_checksum: none [crc32c] crc64 xxhash > compression: lz4 > background_compression: none > str_hash: crc32c crc64 [siphash] > metadata_target: none > foreground_target: none > background_target: none > promote_target: none > erasure_code: 0 > inodes_32bit: 1 > shard_inode_numbers: 1 > inodes_use_key_cache: 1 > gc_reserve_percent: 8 > gc_reserve_bytes: 0 B > root_reserve_percent: 0 > wide_macs: 0 > promote_whole_extents: 0 > acl: 1 > usrquota: 0 > grpquota: 0 > prjquota: 0 > journal_flush_delay: 1000 > journal_flush_disabled: 0 > journal_reclaim_delay: 100 > journal_transaction_names: 1 > allocator_stuck_timeout: 30 > version_upgrade: [compatible] incompatible none > nocow: 0 > > members_v2 (size 160): > Device: 0 > Label: (none) > UUID: > 352e33b9-dde4-48da-8fe2-255ae78c6320 > Size: 24.0 GiB > read errors: 0 > write errors: 0 > checksum errors: 2 > seqread iops: 0 > seqwrite iops: 0 > randread iops: 0 > randwrite iops: 0 > Bucket size: 256 KiB > First bucket: 0 > Buckets: 98304 > Last mount: Tue Oct 15 20:23:32 2024 > Last superblock write: 249 > State: rw > Data allowed: journal,btree,user > Has data: journal,btree,user > Btree allocated bitmap blocksize: 1.00 MiB > Btree allocated bitmap: > 0000000000000000000000000000011111111111111111111111111111111111 > Durability: 1 > Discard: 1 > Freespace initialized: 1 > > errors (size 24): > bset_bad_csum 1 Sat Jul 6 > 07:43:37 2024 > > > dmesg output: > --- > > ... > > [ 230.456893] bcachefs (dm-7): mounting version 1.12: (unknown version) > opts=compression=lz4 > [ 230.456911] bcachefs (dm-7): recovering from clean shutdown, journal seq > 4901 > [ 230.456915] bcachefs (dm-7): Version downgrade required: > [ 230.469098] bcachefs (dm-7): alloc_read... done > [ 230.469111] bcachefs (dm-7): stripes_read... done > [ 230.469115] bcachefs (dm-7): snapshots_read... done > [ 230.469436] bcachefs (dm-7): journal_replay... done > [ 230.469441] bcachefs (dm-7): resume_logged_ops... done > [ 230.469450] bcachefs (dm-7): going read-write > [ 368.351326] INFO: task bch-copygc/dm-7:547 blocked for more than 122 > seconds. > [ 368.351336] Not tainted 6.9.4-arch1-1 #1 > [ 368.351338] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 368.351340] task:bch-copygc/dm-7 state:D stack:0 pid:547 tgid:547 > ppid:2 flags:0x00004000 > [ 368.351345] Call Trace: > [ 368.351348] <TASK> > [ 368.351354] __schedule+0x3c7/0x1510 > [ 368.351368] schedule+0x27/0xf0 > [ 368.351372] __closure_sync+0x7e/0x140 > [ 368.351382] __bch2_write+0x136b/0x1660 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 368.351436] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 368.351440] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 368.351441] ? __kmalloc+0x1a7/0x440 > [ 368.351446] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 368.351448] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 368.351452] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 368.351454] ? local_clock_noinstr+0xd/0xd0 > [ 368.351456] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 368.351457] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 368.351460] ? bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 368.351489] bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 368.351511] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 368.351512] ? bch2_btree_path_traverse_one+0x958/0xcf0 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 368.351539] bch2_data_update_init+0x68b/0x1420 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 368.351573] ? bch2_move_extent+0x3da/0xed0 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 368.351602] bch2_move_extent+0x3da/0xed0 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 368.351631] ? bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 368.351652] bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 368.351681] ? bch2_copygc+0x210/0x880 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 368.351702] bch2_copygc+0x210/0x880 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 368.351732] bch2_copygc_thread+0x152/0x3d0 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 368.351775] ? bch2_copygc_thread+0xcf/0x3d0 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 368.351828] ? __pfx_bch2_copygc_thread+0x10/0x10 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 368.351868] kthread+0xcf/0x100 > [ 368.351876] ? __pfx_kthread+0x10/0x10 > [ 368.351882] ret_from_fork+0x31/0x50 > [ 368.351889] ? __pfx_kthread+0x10/0x10 > [ 368.351894] ret_from_fork_asm+0x1a/0x30 > [ 368.351905] </TASK> > [ 491.230894] INFO: task bch-copygc/dm-7:547 blocked for more than 245 > seconds. > [ 491.230914] Not tainted 6.9.4-arch1-1 #1 > [ 491.230920] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 491.230924] task:bch-copygc/dm-7 state:D stack:0 pid:547 tgid:547 > ppid:2 flags:0x00004000 > [ 491.230939] Call Trace: > [ 491.230944] <TASK> > [ 491.230955] __schedule+0x3c7/0x1510 > [ 491.230984] schedule+0x27/0xf0 > [ 491.230993] __closure_sync+0x7e/0x140 > [ 491.231011] __bch2_write+0x136b/0x1660 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 491.231160] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 491.231169] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 491.231174] ? __kmalloc+0x1a7/0x440 > [ 491.231186] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 491.231192] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 491.231206] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 491.231211] ? local_clock_noinstr+0xd/0xd0 > [ 491.231218] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 491.231223] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 491.231232] ? bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 491.231340] bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 491.231412] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 491.231418] ? bch2_btree_path_traverse_one+0x958/0xcf0 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 491.231509] bch2_data_update_init+0x68b/0x1420 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 491.231625] ? bch2_move_extent+0x3da/0xed0 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 491.231732] bch2_move_extent+0x3da/0xed0 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 491.231823] ? bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 491.231883] bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 491.231963] ? bch2_copygc+0x210/0x880 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 491.232022] bch2_copygc+0x210/0x880 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 491.232089] bch2_copygc_thread+0x152/0x3d0 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 491.232148] ? bch2_copygc_thread+0xcf/0x3d0 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 491.232217] ? __pfx_bch2_copygc_thread+0x10/0x10 [bcachefs > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > [ 491.232271] kthread+0xcf/0x100 > [ 491.232282] ? __pfx_kthread+0x10/0x10 > [ 491.232289] ret_from_fork+0x31/0x50 > [ 491.232298] ? __pfx_kthread+0x10/0x10 > [ 491.232304] ret_from_fork_asm+0x1a/0x30 > [ 491.232319] </TASK> > > ...
