Hi Andy,

On 5/5/23 3:44 AM, Andrew Price wrote:
Hi Bob,

On 04/05/2023 18:43, Bob Peterson wrote:
Before this patch function gfs2_dinode_dealloc would abort if it got a
bad return code from gfs2_rindex_update. The problem is that it left the
dinode in the unlinked (not free) state, which meant subsequent fsck
would clean it up and flag an error. That meant some of our QE tests
would fail.

As I understand it the test is an interrupted rename loop workload and gfs2_grow at the same time, and the bad return code is -EINTR, right?

Correct.

The sole purpose of gfs2_rindex_update, in this code path, is to read in
any newer rgrps added by gfs2_grow. But since this is a delete operation
it won't actually use any of those new rgrps. It can really only twiddle
the bits from "Unlinked" to "Free" in an existing rgrp. Therefore the
error should not prevent the transition from unlinked to free.

This patch makes gfs2_dinode_dealloc ignore the bad return code and
proceed with freeing the dinode so the QE tests will not be tripped up.

Is it really ok to ignore all potential errors here? I wonder if it should just ignore -EINTR (or whichever error the test produces) so that it can still fail well for errors like -EIO.

Good question.

The call to gfs2_rindex_update is really not even needed in gfs2_dinode_dealloc because this is the last stage of the delete where we are freeing the dinode itself. I've even considered removing the call altogether. So to fail the operation for such an inconsequential action's failure seems like throwing the proverbial baby out with the bath water.

Maybe we should just remove the call to gfs2_rindex_update altogether and delegate it to earlier parts of the evict/delete process.

The original intent of calling gfs2_rindex_update in the evict/delete sequence was to ensure we have the newest resource groups from gfs2_grow because any file being evicted may have references to the new rgrps created by gfs2_grow that need to be freed, even if the dinode itself resides in an old rgrp. This is pretty much true for all parts of the process that evicts deleted dinodes except for gfs2_dinode_dealloc itself. For example, a new dinode might have an eattr, indirect block, data block, or whatever, in one of the new rgrps added by gfs2_grow.

However, since the inode was created/instantiated (which must be true in order for it to be evicted), the dinode itself must reside in a previously instantiated rgrp, and therefore the call to gfs2_rindex_update is not needed at all.

So if the call to it fails, imho, it shouldn't fail the rest of the gfs2_dinode_dealloc, regardless of the failure.

The next question you may ask is: why don't we get the -EINTR when reading in new rgrps for the purposes of deleting other parts of the file, its eattrs, indirect blocks, data blocks, etc.? The answer is: I don't know, but I suspect we have other bugs lurking in that area. I suspect if we try hard enough we can create other problems in which the punch_hole code doesn't read in new rgrps.

It may be tempting to think that this also cannot happen because the rgrps must also be instantiated for any eattrs, metadata, data to be assigned to the dinode being evicted/deleted. But that's non-clustered file system thinking.

In gfs2, it is possible for one cluster node to read in new rgrps from gfs2_grow, then assign those blocks to a new dinode that's already open on a different node, then delete that file, causing a second cluster node to evict, and try to reference those new blocks before the new rgrps are read in. So we need to be very careful.

We should probably spend some time trying to force these conditions to see if we can flush out more bugs.

For some reason, with this test, we only see this particular problem with gfs2_dinode_dealloc, and that's the problem I'm trying to fix with the patch.

Regards,

Bob Peterson

Reply via email to