Hi Kent, I managed to catch up a bit on the fsync thing we had talked about earlier this year. Skimming through the original thread, I think this mail [1] summarizes things best. The short of it is that we can address at least a couple of the failures we're seeing with fstests generic/441,484 with a couple small tweaks in bcachefs and fstests, but these aren't necessarily the longer term fix.
Firstly, patch 1 is just another unrelated (un)freeze fixup I happened across when hacking around. I don't know that it currently associates to any related test failures. I just include it here for convenience. Patch 2 of this series tweaks the fsync path to be a bit more deliberate / less aggressive to help avoid spurious shutdowns. The reasoning behind this is that if fsync fails, the user can't be certain of the state of things on disk anyways. What I've observed with this patch is that it seems to prevent generic/484 failures (though not sure that is guaranteed) and based on the original thread, it can address generic/441 when combined with an fstests tweak to allow the fs a bit of time to idle before transitioning to the dm error table.. All in all, I still think this is a reasonable incremental improvement. I think the longer term fix here is more something like the ability to retry metadata I/O on failure such that we can be a little less sensitive to emergency shutdowns. I had managed to hack up a quick prototype of metadata I/O failure/retries a few weeks or so ago just to explore how difficult it might be, and it didn't seem that bad IIRC. The bigger question in my mind is how to deal with journal writes, particularly if journal I/O is any more frequent than the common filesystems fstests tends to accommodate (i.e. xfs, ext4, etc.). I suspect this is worth discussing further in an upcoming call.. Also just as a data point, btrfs skips generic/441 in favor of its own custom variant in btrfs/146. That test runs the same fsync tool, but it looks like it sets up a combination of data striping (raid0) and metadata replication on the fs presumably to facilitate data I/O errors on single disk errors without triggering high level metadata errors. This might be another option worth considering for bcachefs if we can do something similar... Thoughts, reviews, flames appreciated. Brian [1] https://lore.kernel.org/linux-bcachefs/Y+EduoshRHXec+XU@bfoster/ Brian Foster (2): bcachefs: don't attempt rw on unfreeze when shutdown bcachefs: return from fsync on writeback error to avoid early shutdown fs/bcachefs/fs-io.c | 14 +++++++++----- fs/bcachefs/fs.c | 3 +++ 2 files changed, 12 insertions(+), 5 deletions(-) -- 2.42.0
