Hello, there appears to be a bug in bcachefs in which certain changes to subvolumes and snapshots can result in an inability to suspend the system. Specifically, if a bcachefs snapshot is taken of a subvolume, then a file is removed or modified in either the subvolume or snapshot, then the subvolume and snapshot are deleted, then after that s2idle will fail until the system is rebooted. This is 100% reproducible on my laptop running rc7.
Here is a short example of something that will trigger the bug: --- [carl@clip test]$ bcachefs subvolume create subvol [carl@clip test]$ touch subvol/file [carl@clip test]$ bcachefs subvolume snapshot subvol snapshot_of_subvol [carl@clip test]$ rm subvol/file [carl@clip test]$ bcachefs subvolume delete subvol [carl@clip test]$ bcachefs subvolume delete snapshot_of_subvol --- After this suspending the system will fail and produce kernel messages like the following: --- [10898.793676] Freezing remaining freezable tasks [10918.797255] Freezing remaining freezable tasks failed after 20.003 seconds (0 tasks refusing to freeze, wq_busy=1): [10918.797270] Showing freezable workqueues that are still busy: [10918.797273] workqueue events_freezable: flags=0x4 [10918.797277] pwq 28: cpus=14 node=0 flags=0x0 nice=0 active=0/0 refcnt=2 [10918.797289] inactive: pci_pme_list_scan [10918.797309] workqueue bcachefs_write_ref: flags=0x4 [10918.797314] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=2/0 refcnt=3 [10918.797322] in-flight: 12451:bch2_subvolume_wait_for_pagecache_and_delete [bcachefs] bch2_subvolume_wait_for_pagecache_and_delete [bcachefs] [10918.797519] workqueue bcachefs_io: flags=0x1c [10918.797525] pwq 9: cpus=4 node=0 flags=0x0 nice=-20 active=0/0 refcnt=2 [10918.797532] inactive: journal_write_work [bcachefs] [10918.797616] workqueue bcachefs_write_ref: flags=0x4 [10918.797620] pwq 18: cpus=9 node=0 flags=0x0 nice=0 active=2/0 refcnt=3 [10918.797626] in-flight: 17562:bch2_subvolume_wait_for_pagecache_and_delete [bcachefs] bch2_subvolume_wait_for_pagecache_and_delete [bcachefs] [10918.798386] Restarting kernel threads ... done. [10918.799643] OOM killer enabled. [10918.799647] Restarting tasks ... done. [10918.803749] random: crng reseeded on system resumption [10919.295422] PM: suspend exit --- - I have only tested this on bcachefs filesystems with 4k blocks. If this is related to the issue causing one of my other bug reports today then it's possible in may not happen on filesystems with 512 byte blocks (untested). Thank you, Carl Thompson
