On Wed, Sep 20, 2023 at 09:28:44AM -0400, Brian Foster wrote:
> On Tue, Sep 19, 2023 at 09:21:16PM -0400, Kent Overstreet wrote:
> > Pulled this into the testing branch and then got
> > 
> > https://evilpiepirate.org/~testdashboard/c/de4ea1e2a9ceec5d55fffbc1acab89f0dc8f90b6/xfstests.generic.459/log.br
> > 
> > So I'll likely kick this patch back out for now, let me know when you
> > have a fixed version :)
> > 
> 
> Ah, sorry.. I should have mentioned this in the cover letter. I'm aware
> of this failure but my initial triage has it as an unrelated problem.
> That test basically induces I/O errors by explicitly overprovisioning a
> dm-thin volume for the fs. The original bug was a livelock issue on XFS
> related to metadata writeback failure/retry in this particular scenario.
> 
> The test relies on freeze in that it basically consumes all of the
> initially provisioned space, issues a freeze in the background (which
> will start off hanging because not everything can write back until more
> storage is available), and then grows available space so freeze can
> proceed to completion. It uses the success/failure of the freeze to
> determine pass/failure, and if the freeze fails it looks like it expects
> the filesystem to have been remounted ro (which I believe is ext4's way
> of dealing with this).
> 
> My notes say that freeze failed because the fs shutdown, which I think
> is due to the whole overprovision/flush thing leading to I/O errors
> (i.e. expected behavior, probably similar to ext4). But TBH I hadn't dug
> into it further than initial triage to rule out the core freeze
> mechanism itself. I'll dig more into that soon to see whether we need to
> change the test or something in the kernel, though I don't think it
> necessarily needs to gate freeze support..
> 

Yeah, so I can confirm that the only real difference in behavior between
ext4 and bcachefs wrt to this test is that the former sets SB_RDONLY in
its internal error remount read-only sequence, which in turn results in
"ro" text in the /proc/mounts output and thus satisfies the test.

bcachefs only does that in the ioctl shutdown paths for whatever reason.
Unfortunately it's not quite as simple as doing the same in the fatal
error path. If I do that, it looks like the async error handling worker
can race with a freeze, which does two different things wrt to internal
locks when sb_rdonly() is true vs. not.

This can possibly all be serialized via the right combination of
s_umount and freeze protection in the error handler to ensure we never
see the wrong combination of being freeze locked && sb_rdonly(), but
that requires a little more thought. Given the test could also easily
change to checking for -EROFS or some such on bcachefs vs. "ro" in the
mount status, this does strike me as more of a minor shutdown
inconsistency than a significant freeze or shutdown bug. What might be
more useful is at some point to audit the various read-only paths for
consistent behaviors regardless of how invoked (i.e. "ro" for remount,
shutdown) and proper serialization.

Brian

Reply via email to