在 2025/7/9 08:29, Dave Chinner 写道:
On Tue, Jul 08, 2025 at 09:55:14AM +0200, Christian Brauner wrote:
On Mon, Jul 07, 2025 at 05:45:32PM -0700, Darrick J. Wong wrote:
On Tue, Jul 08, 2025 at 08:52:47AM +0930, Qu Wenruo wrote:
在 2025/7/8 08:32, Dave Chinner 写道:
On Fri, Jul 04, 2025 at 10:12:29AM +0930, Qu Wenruo wrote:
Currently all the filesystems implementing the
super_opearations::shutdown() callback can not afford losing a device.
Thus fs_bdev_mark_dead() will just call the shutdown() callback for the
involved filesystem.
But it will no longer be the case, with multi-device filesystems like
btrfs and bcachefs the filesystem can handle certain device loss without
shutting down the whole filesystem.
To allow those multi-device filesystems to be integrated to use
fs_holder_ops:
- Replace super_opearation::shutdown() with
super_opearations::remove_bdev()
To better describe when the callback is called.
This conflates cause with action.
The shutdown callout is an action that the filesystem must execute,
whilst "remove bdev" is a cause notification that might require an
action to be take.
Yes, the cause could be someone doing hot-unplug of the block
device, but it could also be something going wrong in software
layers below the filesystem. e.g. dm-thinp having an unrecoverable
corruption or ENOSPC errors.
We already have a "cause" notification: blk_holder_ops->mark_dead().
The generic fs action that is taken by this notification is
fs_bdev_mark_dead(). That action is to invalidate caches and shut
down the filesystem.
btrfs needs to do something different to a blk_holder_ops->mark_dead
notification. i.e. it needs an action that is different to
fs_bdev_mark_dead().
Indeed, this is how bcachefs already handles "single device
died" events for multi-device filesystems - see
bch2_fs_bdev_mark_dead().
I do not think it's the correct way to go, especially when there is already
fs_holder_ops.
We're always going towards a more generic solution, other than letting the
individual fs to do the same thing slightly differently.
On second thought -- it's weird that you'd flush the filesystem and
shrink the inode/dentry caches in a "your device went away" handler.
Fancy filesystems like bcachefs and btrfs would likely just shift IO to
a different bdev, right? And there's no good reason to run shrinkers on
either of those fses, right?
Yes, the naming is not perfect and mixing cause and action, but the end
result is still a more generic and less duplicated code base.
I think dchinner makes a good point that if your filesystem can do
something clever on device removal, it should provide its own block
device holder ops instead of using fs_holder_ops. I don't understand
why you need a "generic" solution for btrfs when it's not going to do
what the others do anyway.
I think letting filesystems implement their own holder ops should be
avoided if we can. Christoph may chime in here. I have no appettite for
exporting stuff like get_bdev_super() unless absolutely necessary. We
tried to move all that handling into the VFS to eliminate a slew of
deadlocks we detected and fixed. I have no appetite to repeat that
cycle.
Except it isn't actually necessary.
Everyone here seems to be assuming that the filesystem *must* take
an active superblock reference to process a device removal event,
and that is *simply not true*.
bcachefs does not use get_bdev_super() or an active superblock
reference to process ->mark_dead events.
Yes, bcachefs doesn't go this path, but the reason is more important.
Is it just because there is no such callback so that Kent has to come up
his own solution, or something else?
If the former case, all your argument here makes no sense.
Adding Kent here to see if he wants a more generic s_op->remove_bdev()
solution or the current each fs doing its own mark_dead() callback.
Thanks,
Qu
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel