Unfortunately 6.10 and 6.11 aren't options for the normal use of my laptop. But I was easily able to recover from backups so no harm done and I'm back on 6.9.
I do think problems with being able to switch between different kernel versions are a pretty big deal, though. At least they are in my workflows. Would an approach similar to the one ZFS takes be better where the filesystem's on-disk format is never upgraded automatically but requires the admin to manually run an upgrade? Can tests be added to your test suite to make sure previous kernels can still access and use filesystems after they've been upgraded by newer kernels? Being able to revert to an earlier kernel if a newer one has problems is a big deal for me. Thanks, Carl > On 2024-10-16 5:09 PM PDT Kent Overstreet <[email protected]> wrote: > > > On Tue, Oct 15, 2024 at 09:51:16PM -0700, Carl E. Thompson wrote: > > Hi, > > > > I believe there is another newer version downgrade bug in bcachefs > > (tested versions: 6.9.4 <--> 6.11.3). > > > > My laptop runs kernel 6.9.4 normally with 4 bcachefs filesystems > > on LVM2 logical volumes mounted including the root filesystem. I > > needed to test something under 6.11 so I booted kernel 6.11.3 and > > used the system normally from the console (bcachefs worked fine > > under 6.11.3). After attempting to boot back into 6.9.4 my laptop > > no longer starts and hangs when trying to mount and manipulate > > the root filesystem. The kernel log shows kernel traces due to > > hung copygc tasks (see dmesg output below). This happens every > > time I try to start 6.9.4 now. The kernel log reveals that the > > bcachefs filesystem seems to complete the version downgrade and > > initial mount successfully but it starts hanging as soon as the > > filesystem is used. Booting back into the 6.11.3 kernel causes > > the filesystems to work again but I can't run 6.11 on my laptop > > normally because 6.11 (and 6.10) have amdgpu issues that cause > > irrecoverable graphical desktop lockups. So right now I can > > either choose to boot with filesystem s that don't work or with > > periodic hard graphical desktop crashes neither of which is > > ideal. > > Yeah, it looks like 6.9 isn't running the recovery passess specified in > the superblock downgrade section, meaning we start running without > correct accounting counters - 6.10 works, though. > > 6.10 is a LTS release and 6.9 is not - is 6.10 an option? > > > > > > On my laptop and some of my other computers I boot multiple Linux > > distributions which usually run different kernels and mount the same > > filesystems on all of them (except root). So I do need to be able to switch > > back and forth between kernels as needed on all of my systems and these > > types of issues give me some pause. I will disable bcachefs use on my dev > > systems and servers for now until I am more confident that there is a solid > > testing plan in place to make sure there can be no more of these kind of > > issues in the future when booting multiple kernels. I will keep bcachefs on > > my laptop for testing. A fix for my laptop isn't urgent for me personally > > as I can recreate the filesystems under 6.9.4 and restore from backups. Of > > course others people might need a fix more quickly. Next time I need to > > boot a different kernel I'll make sure to create LVM snapshots of the > > devices first to which I can revert if needed. > > > > Thanks, > > Carl > > > > show-super from one affected filesystem: > > --- > > [clip carl]# bcachefs show-super /dev/clip/root-alpine > > Device: (unknown device) > > External UUID: > > c992a5de-c9b3-4fd1-82ed-4d2f66bc11cb > > Internal UUID: > > 43b4fe97-f5a4-48b3-8d99-3a3dda25211a > > Magic number: > > c68573f6-66ce-90a9-d96a-60cf803df7ef > > Device index: 0 > > Label: (none) > > Version: 1.12: (unknown version) > > Version upgrade complete: 1.12: (unknown version) > > Oldest version on disk: 1.4: member_seq > > Created: Fri Mar 22 19:19:01 2024 > > Sequence number: 249 > > Time of last write: Tue Oct 15 20:23:34 2024 > > Superblock size: 4.45 KiB/1.00 MiB > > Clean: 0 > > Devices: 1 > > Sections: > > members_v1,replicas_v0,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade > > Features: > > lz4,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes > > Compat features: > > alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done > > > > Options: > > block_size: 4.00 KiB > > btree_node_size: 256 KiB > > errors: continue [fix_safe] panic ro > > metadata_replicas: 1 > > data_replicas: 1 > > metadata_replicas_required: 1 > > data_replicas_required: 1 > > encoded_extent_max: 64.0 KiB > > metadata_checksum: none [crc32c] crc64 xxhash > > data_checksum: none [crc32c] crc64 xxhash > > compression: lz4 > > background_compression: none > > str_hash: crc32c crc64 [siphash] > > metadata_target: none > > foreground_target: none > > background_target: none > > promote_target: none > > erasure_code: 0 > > inodes_32bit: 1 > > shard_inode_numbers: 1 > > inodes_use_key_cache: 1 > > gc_reserve_percent: 8 > > gc_reserve_bytes: 0 B > > root_reserve_percent: 0 > > wide_macs: 0 > > promote_whole_extents: 0 > > acl: 1 > > usrquota: 0 > > grpquota: 0 > > prjquota: 0 > > journal_flush_delay: 1000 > > journal_flush_disabled: 0 > > journal_reclaim_delay: 100 > > journal_transaction_names: 1 > > allocator_stuck_timeout: 30 > > version_upgrade: [compatible] incompatible none > > nocow: 0 > > > > members_v2 (size 160): > > Device: 0 > > Label: (none) > > UUID: > > 352e33b9-dde4-48da-8fe2-255ae78c6320 > > Size: 24.0 GiB > > read errors: 0 > > write errors: 0 > > checksum errors: 2 > > seqread iops: 0 > > seqwrite iops: 0 > > randread iops: 0 > > randwrite iops: 0 > > Bucket size: 256 KiB > > First bucket: 0 > > Buckets: 98304 > > Last mount: Tue Oct 15 20:23:32 2024 > > Last superblock write: 249 > > State: rw > > Data allowed: journal,btree,user > > Has data: journal,btree,user > > Btree allocated bitmap blocksize: 1.00 MiB > > Btree allocated bitmap: > > 0000000000000000000000000000011111111111111111111111111111111111 > > Durability: 1 > > Discard: 1 > > Freespace initialized: 1 > > > > errors (size 24): > > bset_bad_csum 1 Sat Jul 6 > > 07:43:37 2024 > > > > > > dmesg output: > > --- > > > > ... > > > > [ 230.456893] bcachefs (dm-7): mounting version 1.12: (unknown version) > > opts=compression=lz4 > > [ 230.456911] bcachefs (dm-7): recovering from clean shutdown, journal seq > > 4901 > > [ 230.456915] bcachefs (dm-7): Version downgrade required: > > [ 230.469098] bcachefs (dm-7): alloc_read... done > > [ 230.469111] bcachefs (dm-7): stripes_read... done > > [ 230.469115] bcachefs (dm-7): snapshots_read... done > > [ 230.469436] bcachefs (dm-7): journal_replay... done > > [ 230.469441] bcachefs (dm-7): resume_logged_ops... done > > [ 230.469450] bcachefs (dm-7): going read-write > > [ 368.351326] INFO: task bch-copygc/dm-7:547 blocked for more than 122 > > seconds. > > [ 368.351336] Not tainted 6.9.4-arch1-1 #1 > > [ 368.351338] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > > this message. > > [ 368.351340] task:bch-copygc/dm-7 state:D stack:0 pid:547 tgid:547 > > ppid:2 flags:0x00004000 > > [ 368.351345] Call Trace: > > [ 368.351348] <TASK> > > [ 368.351354] __schedule+0x3c7/0x1510 > > [ 368.351368] schedule+0x27/0xf0 > > [ 368.351372] __closure_sync+0x7e/0x140 > > [ 368.351382] __bch2_write+0x136b/0x1660 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 368.351436] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 368.351440] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 368.351441] ? __kmalloc+0x1a7/0x440 > > [ 368.351446] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 368.351448] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 368.351452] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 368.351454] ? local_clock_noinstr+0xd/0xd0 > > [ 368.351456] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 368.351457] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 368.351460] ? bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 368.351489] bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 368.351511] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 368.351512] ? bch2_btree_path_traverse_one+0x958/0xcf0 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 368.351539] bch2_data_update_init+0x68b/0x1420 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 368.351573] ? bch2_move_extent+0x3da/0xed0 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 368.351602] bch2_move_extent+0x3da/0xed0 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 368.351631] ? bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 368.351652] bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 368.351681] ? bch2_copygc+0x210/0x880 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 368.351702] bch2_copygc+0x210/0x880 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 368.351732] bch2_copygc_thread+0x152/0x3d0 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 368.351775] ? bch2_copygc_thread+0xcf/0x3d0 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 368.351828] ? __pfx_bch2_copygc_thread+0x10/0x10 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 368.351868] kthread+0xcf/0x100 > > [ 368.351876] ? __pfx_kthread+0x10/0x10 > > [ 368.351882] ret_from_fork+0x31/0x50 > > [ 368.351889] ? __pfx_kthread+0x10/0x10 > > [ 368.351894] ret_from_fork_asm+0x1a/0x30 > > [ 368.351905] </TASK> > > [ 491.230894] INFO: task bch-copygc/dm-7:547 blocked for more than 245 > > seconds. > > [ 491.230914] Not tainted 6.9.4-arch1-1 #1 > > [ 491.230920] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > > this message. > > [ 491.230924] task:bch-copygc/dm-7 state:D stack:0 pid:547 tgid:547 > > ppid:2 flags:0x00004000 > > [ 491.230939] Call Trace: > > [ 491.230944] <TASK> > > [ 491.230955] __schedule+0x3c7/0x1510 > > [ 491.230984] schedule+0x27/0xf0 > > [ 491.230993] __closure_sync+0x7e/0x140 > > [ 491.231011] __bch2_write+0x136b/0x1660 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 491.231160] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 491.231169] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 491.231174] ? __kmalloc+0x1a7/0x440 > > [ 491.231186] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 491.231192] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 491.231206] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 491.231211] ? local_clock_noinstr+0xd/0xd0 > > [ 491.231218] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 491.231223] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 491.231232] ? bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 491.231340] bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 491.231412] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 491.231418] ? bch2_btree_path_traverse_one+0x958/0xcf0 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 491.231509] bch2_data_update_init+0x68b/0x1420 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 491.231625] ? bch2_move_extent+0x3da/0xed0 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 491.231732] bch2_move_extent+0x3da/0xed0 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 491.231823] ? bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 491.231883] bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 491.231963] ? bch2_copygc+0x210/0x880 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 491.232022] bch2_copygc+0x210/0x880 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 491.232089] bch2_copygc_thread+0x152/0x3d0 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 491.232148] ? bch2_copygc_thread+0xcf/0x3d0 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 491.232217] ? __pfx_bch2_copygc_thread+0x10/0x10 [bcachefs > > 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a] > > [ 491.232271] kthread+0xcf/0x100 > > [ 491.232282] ? __pfx_kthread+0x10/0x10 > > [ 491.232289] ret_from_fork+0x31/0x50 > > [ 491.232298] ? __pfx_kthread+0x10/0x10 > > [ 491.232304] ret_from_fork_asm+0x1a/0x30 > > [ 491.232319] </TASK> > > > > ...
