On Tue, Jul 08, 2025 at 01:20:50PM -0700, Darrick J. Wong wrote:
> On Tue, Jul 08, 2025 at 12:20:00PM +0200, Jan Kara wrote:
> > On Mon 07-07-25 17:45:32, Darrick J. Wong wrote:
> > > On Tue, Jul 08, 2025 at 08:52:47AM +0930, Qu Wenruo wrote:
> > > > 在 2025/7/8 08:32, Dave Chinner 写道:
> > > > > On Fri, Jul 04, 2025 at 10:12:29AM +0930, Qu Wenruo wrote:
> > > > > > Currently all the filesystems implementing the
> > > > > > super_opearations::shutdown() callback can not afford losing a 
> > > > > > device.
> > > > > > 
> > > > > > Thus fs_bdev_mark_dead() will just call the shutdown() callback for 
> > > > > > the
> > > > > > involved filesystem.
> > > > > > 
> > > > > > But it will no longer be the case, with multi-device filesystems 
> > > > > > like
> > > > > > btrfs and bcachefs the filesystem can handle certain device loss 
> > > > > > without
> > > > > > shutting down the whole filesystem.
> > > > > > 
> > > > > > To allow those multi-device filesystems to be integrated to use
> > > > > > fs_holder_ops:
> > > > > > 
> > > > > > - Replace super_opearation::shutdown() with
> > > > > >    super_opearations::remove_bdev()
> > > > > >    To better describe when the callback is called.
> > > > > 
> > > > > This conflates cause with action.
> > > > > 
> > > > > The shutdown callout is an action that the filesystem must execute,
> > > > > whilst "remove bdev" is a cause notification that might require an
> > > > > action to be take.
> > > > > 
> > > > > Yes, the cause could be someone doing hot-unplug of the block
> > > > > device, but it could also be something going wrong in software
> > > > > layers below the filesystem. e.g. dm-thinp having an unrecoverable
> > > > > corruption or ENOSPC errors.
> > > > > 
> > > > > We already have a "cause" notification: blk_holder_ops->mark_dead().
> > > > > 
> > > > > The generic fs action that is taken by this notification is
> > > > > fs_bdev_mark_dead().  That action is to invalidate caches and shut
> > > > > down the filesystem.
> > > > > 
> > > > > btrfs needs to do something different to a blk_holder_ops->mark_dead
> > > > > notification. i.e. it needs an action that is different to
> > > > > fs_bdev_mark_dead().
> > > > > 
> > > > > Indeed, this is how bcachefs already handles "single device
> > > > > died" events for multi-device filesystems - see
> > > > > bch2_fs_bdev_mark_dead().
> > > > 
> > > > I do not think it's the correct way to go, especially when there is 
> > > > already
> > > > fs_holder_ops.
> > > > 
> > > > We're always going towards a more generic solution, other than letting 
> > > > the
> > > > individual fs to do the same thing slightly differently.
> > > 
> > > On second thought -- it's weird that you'd flush the filesystem and
> > > shrink the inode/dentry caches in a "your device went away" handler.
> > > Fancy filesystems like bcachefs and btrfs would likely just shift IO to
> > > a different bdev, right?  And there's no good reason to run shrinkers on
> > > either of those fses, right?
> > 
> > I agree it is awkward and bcachefs avoids these in case of removal it can
> > handle gracefully AFAICS.
> > 
> > > > Yes, the naming is not perfect and mixing cause and action, but the end
> > > > result is still a more generic and less duplicated code base.
> > > 
> > > I think dchinner makes a good point that if your filesystem can do
> > > something clever on device removal, it should provide its own block
> > > device holder ops instead of using fs_holder_ops.  I don't understand
> > > why you need a "generic" solution for btrfs when it's not going to do
> > > what the others do anyway.
> > 
> > Well, I'd also say just go for own fs_holder_ops if it was not for the
> > awkward "get super from bdev" step. As Christian wrote we've encapsulated
> > that in fs/super.c and bdev_super_lock() in particular but the calling
> > conventions for the fs_holder_ops are not very nice (holding
> > bdev_holder_lock, need to release it before grabbing practically anything
> > else) so I'd have much greater peace of mind if this didn't spread too
> > much. Once you call bdev_super_lock() and hold on to sb with s_umount held,
> > things are much more conventional for the fs land so I'd like if this
> > step happened before any fs hook got called. So I prefer something like
> > Qu's proposal of separate sb op for device removal over exporting
> > bdev_super_lock(). Like:
> > 
> > static void fs_bdev_mark_dead(struct block_device *bdev, bool surprise)
> > {
> >         struct super_block *sb;
> > 
> >         sb = bdev_super_lock(bdev, false);
> >         if (!sb)
> >                 return;
> > 
> >     if (sb->s_op->remove_bdev) {
> >             sb->s_op->remove_bdev(sb, bdev, surprise);
> >             return;
> >     }
> 
> It feels odd but I could live with this, particularly since that's the
> direction that brauner is laying down. :)

I want to reiterate that no one is saying "under no circumstances
implement your own holder ops". But if you rely on the VFS locking then
you better not spill it's guts into your filesystem and make us export
this bloody locking that half the world had implemented wrong in their
drivers in the first places spewing endless syzbot deadlocks reports
that we had to track down and fix. That will not happen again similar
way we don't bleed all the nastiness of other locking paths.

Please all stop long philosophical treatises about things no on has ever
argued. btrfs wants to rely on the VFS infra. That is fine and well. We
will support and enable this.

I think the two method idea is fine given that they now are clearly
delineated.

Thanks for providing some clarity here, Darrick and Qu.


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to