On Tue, Oct 15, 2024 at 09:51:16PM -0700, Carl E. Thompson wrote:
> Hi,
> 
>      I believe there is another newer version downgrade bug in bcachefs 
> (tested versions: 6.9.4 <--> 6.11.3).
> 
>      My laptop runs kernel 6.9.4 normally with 4 bcachefs filesystems
>      on LVM2 logical volumes mounted including the root filesystem. I
>      needed to test something under 6.11 so I booted kernel 6.11.3 and
>      used the system normally from the console (bcachefs worked fine
>      under 6.11.3). After attempting to boot back into 6.9.4 my laptop
>      no longer starts and hangs when trying to mount and manipulate
>      the root filesystem. The kernel log shows kernel traces due to
>      hung copygc tasks (see dmesg output below). This happens every
>      time I try to start 6.9.4 now. The kernel log reveals that the
>      bcachefs filesystem seems to complete the version downgrade and
>      initial mount successfully but it starts hanging as soon as the
>      filesystem is used. Booting back into the 6.11.3 kernel causes
>      the filesystems to work again but I can't run 6.11 on my laptop
>      normally because 6.11 (and 6.10) have amdgpu issues that cause
>      irrecoverable graphical desktop lockups. So right now I can
>      either choose to boot with filesystem s that don't work or with
>      periodic hard graphical desktop crashes neither of which is
>      ideal.

Yeah, it looks like 6.9 isn't running the recovery passess specified in
the superblock downgrade section, meaning we start running without
correct accounting counters - 6.10 works, though.

6.10 is a LTS release and 6.9 is not - is 6.10 an option?


> 
>      On my laptop and some of my other computers I boot multiple Linux 
> distributions which usually run different kernels and mount the same 
> filesystems on all of them (except root). So I do need to be able to switch 
> back and forth between kernels as needed on all of my systems and these types 
> of issues give me some pause. I will disable bcachefs use on my dev systems 
> and servers for now until I am more confident that there is a solid testing 
> plan in place to make sure there can be no more of these kind of issues in 
> the future when booting multiple kernels. I will keep bcachefs on my laptop 
> for testing. A fix for my laptop isn't urgent for me personally as I can 
> recreate the filesystems under 6.9.4 and restore from backups. Of course 
> others people might need a fix more quickly. Next time I need to boot a 
> different kernel I'll make sure to create LVM snapshots of the devices first 
> to which I can revert if needed. 
> 
> Thanks,
> Carl
> 
> show-super from one affected filesystem:
> ---
> [clip carl]# bcachefs show-super /dev/clip/root-alpine 
> Device:                                     (unknown device)
> External UUID:                             
> c992a5de-c9b3-4fd1-82ed-4d2f66bc11cb
> Internal UUID:                             
> 43b4fe97-f5a4-48b3-8d99-3a3dda25211a
> Magic number:                              
> c68573f6-66ce-90a9-d96a-60cf803df7ef
> Device index:                              0
> Label:                                     (none)
> Version:                                   1.12: (unknown version)
> Version upgrade complete:                  1.12: (unknown version)
> Oldest version on disk:                    1.4: member_seq
> Created:                                   Fri Mar 22 19:19:01 2024
> Sequence number:                           249
> Time of last write:                        Tue Oct 15 20:23:34 2024
> Superblock size:                           4.45 KiB/1.00 MiB
> Clean:                                     0
> Devices:                                   1
> Sections:                                  
> members_v1,replicas_v0,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade
> Features:                                  
> lz4,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
> Compat features:                           
> alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done
> 
> Options:
>   block_size:                              4.00 KiB
>   btree_node_size:                         256 KiB
>   errors:                                  continue [fix_safe] panic ro 
>   metadata_replicas:                       1
>   data_replicas:                           1
>   metadata_replicas_required:              1
>   data_replicas_required:                  1
>   encoded_extent_max:                      64.0 KiB
>   metadata_checksum:                       none [crc32c] crc64 xxhash 
>   data_checksum:                           none [crc32c] crc64 xxhash 
>   compression:                             lz4
>   background_compression:                  none
>   str_hash:                                crc32c crc64 [siphash] 
>   metadata_target:                         none
>   foreground_target:                       none
>   background_target:                       none
>   promote_target:                          none
>   erasure_code:                            0
>   inodes_32bit:                            1
>   shard_inode_numbers:                     1
>   inodes_use_key_cache:                    1
>   gc_reserve_percent:                      8
>   gc_reserve_bytes:                        0 B
>   root_reserve_percent:                    0
>   wide_macs:                               0
>   promote_whole_extents:                   0
>   acl:                                     1
>   usrquota:                                0
>   grpquota:                                0
>   prjquota:                                0
>   journal_flush_delay:                     1000
>   journal_flush_disabled:                  0
>   journal_reclaim_delay:                   100
>   journal_transaction_names:               1
>   allocator_stuck_timeout:                 30
>   version_upgrade:                         [compatible] incompatible none 
>   nocow:                                   0
> 
> members_v2 (size 160):
> Device:                                    0
>   Label:                                   (none)
>   UUID:                                    
> 352e33b9-dde4-48da-8fe2-255ae78c6320
>   Size:                                    24.0 GiB
>   read errors:                             0
>   write errors:                            0
>   checksum errors:                         2
>   seqread iops:                            0
>   seqwrite iops:                           0
>   randread iops:                           0
>   randwrite iops:                          0
>   Bucket size:                             256 KiB
>   First bucket:                            0
>   Buckets:                                 98304
>   Last mount:                              Tue Oct 15 20:23:32 2024
>   Last superblock write:                   249
>   State:                                   rw
>   Data allowed:                            journal,btree,user
>   Has data:                                journal,btree,user
>   Btree allocated bitmap blocksize:        1.00 MiB
>   Btree allocated bitmap:                  
> 0000000000000000000000000000011111111111111111111111111111111111
>   Durability:                              1
>   Discard:                                 1
>   Freespace initialized:                   1
> 
> errors (size 24):
> bset_bad_csum                               1               Sat Jul  6 
> 07:43:37 2024
> 
> 
> dmesg output:
> ---
> 
> ...
> 
> [  230.456893] bcachefs (dm-7): mounting version 1.12: (unknown version) 
> opts=compression=lz4
> [  230.456911] bcachefs (dm-7): recovering from clean shutdown, journal seq 
> 4901
> [  230.456915] bcachefs (dm-7): Version downgrade required:
> [  230.469098] bcachefs (dm-7): alloc_read... done
> [  230.469111] bcachefs (dm-7): stripes_read... done
> [  230.469115] bcachefs (dm-7): snapshots_read... done
> [  230.469436] bcachefs (dm-7): journal_replay... done
> [  230.469441] bcachefs (dm-7): resume_logged_ops... done
> [  230.469450] bcachefs (dm-7): going read-write
> [  368.351326] INFO: task bch-copygc/dm-7:547 blocked for more than 122 
> seconds.
> [  368.351336]       Not tainted 6.9.4-arch1-1 #1
> [  368.351338] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  368.351340] task:bch-copygc/dm-7 state:D stack:0     pid:547   tgid:547   
> ppid:2      flags:0x00004000
> [  368.351345] Call Trace:
> [  368.351348]  <TASK>
> [  368.351354]  __schedule+0x3c7/0x1510
> [  368.351368]  schedule+0x27/0xf0
> [  368.351372]  __closure_sync+0x7e/0x140
> [  368.351382]  __bch2_write+0x136b/0x1660 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351436]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351440]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351441]  ? __kmalloc+0x1a7/0x440
> [  368.351446]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351448]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351452]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351454]  ? local_clock_noinstr+0xd/0xd0
> [  368.351456]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351457]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351460]  ? bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351489]  bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351511]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351512]  ? bch2_btree_path_traverse_one+0x958/0xcf0 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351539]  bch2_data_update_init+0x68b/0x1420 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351573]  ? bch2_move_extent+0x3da/0xed0 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351602]  bch2_move_extent+0x3da/0xed0 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351631]  ? bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351652]  bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351681]  ? bch2_copygc+0x210/0x880 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351702]  bch2_copygc+0x210/0x880 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351732]  bch2_copygc_thread+0x152/0x3d0 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351775]  ? bch2_copygc_thread+0xcf/0x3d0 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351828]  ? __pfx_bch2_copygc_thread+0x10/0x10 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351868]  kthread+0xcf/0x100
> [  368.351876]  ? __pfx_kthread+0x10/0x10
> [  368.351882]  ret_from_fork+0x31/0x50
> [  368.351889]  ? __pfx_kthread+0x10/0x10
> [  368.351894]  ret_from_fork_asm+0x1a/0x30
> [  368.351905]  </TASK>
> [  491.230894] INFO: task bch-copygc/dm-7:547 blocked for more than 245 
> seconds.
> [  491.230914]       Not tainted 6.9.4-arch1-1 #1
> [  491.230920] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  491.230924] task:bch-copygc/dm-7 state:D stack:0     pid:547   tgid:547   
> ppid:2      flags:0x00004000
> [  491.230939] Call Trace:
> [  491.230944]  <TASK>
> [  491.230955]  __schedule+0x3c7/0x1510
> [  491.230984]  schedule+0x27/0xf0
> [  491.230993]  __closure_sync+0x7e/0x140
> [  491.231011]  __bch2_write+0x136b/0x1660 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231160]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231169]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231174]  ? __kmalloc+0x1a7/0x440
> [  491.231186]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231192]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231206]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231211]  ? local_clock_noinstr+0xd/0xd0
> [  491.231218]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231223]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231232]  ? bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231340]  bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231412]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231418]  ? bch2_btree_path_traverse_one+0x958/0xcf0 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231509]  bch2_data_update_init+0x68b/0x1420 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231625]  ? bch2_move_extent+0x3da/0xed0 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231732]  bch2_move_extent+0x3da/0xed0 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231823]  ? bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231883]  bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231963]  ? bch2_copygc+0x210/0x880 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.232022]  bch2_copygc+0x210/0x880 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.232089]  bch2_copygc_thread+0x152/0x3d0 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.232148]  ? bch2_copygc_thread+0xcf/0x3d0 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.232217]  ? __pfx_bch2_copygc_thread+0x10/0x10 [bcachefs 
> 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.232271]  kthread+0xcf/0x100
> [  491.232282]  ? __pfx_kthread+0x10/0x10
> [  491.232289]  ret_from_fork+0x31/0x50
> [  491.232298]  ? __pfx_kthread+0x10/0x10
> [  491.232304]  ret_from_fork_asm+0x1a/0x30
> [  491.232319]  </TASK>
> 
> ...

Reply via email to