Re: [Cluster-devel] [PATCH] gfs2: ignore rindex_update failure in dinode_dealloc

Bob Peterson Fri, 05 May 2023 05:29:56 -0700

Hi Andy,

On 5/5/23 3:44 AM, Andrew Price wrote:

Hi Bob,


On 04/05/2023 18:43, Bob Peterson wrote:

Before this patch function gfs2_dinode_dealloc would abort if it got a
bad return code from gfs2_rindex_update. The problem is that it left the
dinode in the unlinked (not free) state, which meant subsequent fsck
would clean it up and flag an error. That meant some of our QE tests
would fail.

As I understand it the test is an interrupted rename loop workload andgfs2_grow at the same time, and the bad return code is -EINTR, right?


Correct.

The sole purpose of gfs2_rindex_update, in this code path, is to read in
any newer rgrps added by gfs2_grow. But since this is a delete operation
it won't actually use any of those new rgrps. It can really only twiddle
the bits from "Unlinked" to "Free" in an existing rgrp. Therefore the
error should not prevent the transition from unlinked to free.

This patch makes gfs2_dinode_dealloc ignore the bad return code and
proceed with freeing the dinode so the QE tests will not be tripped up.

Is it really ok to ignore all potential errors here? I wonder if itshould just ignore -EINTR (or whichever error the test produces) so thatit can still fail well for errors like -EIO.


Good question.

The call to gfs2_rindex_update is really not even needed ingfs2_dinode_dealloc because this is the last stage of the delete wherewe are freeing the dinode itself. I've even considered removing the callaltogether. So to fail the operation for such an inconsequentialaction's failure seems like throwing the proverbial baby out with thebath water.

Maybe we should just remove the call to gfs2_rindex_update altogetherand delegate it to earlier parts of the evict/delete process.

The original intent of calling gfs2_rindex_update in the evict/deletesequence was to ensure we have the newest resource groups from gfs2_growbecause any file being evicted may have references to the new rgrpscreated by gfs2_grow that need to be freed, even if the dinode itselfresides in an old rgrp. This is pretty much true for all parts of theprocess that evicts deleted dinodes except for gfs2_dinode_deallocitself. For example, a new dinode might have an eattr, indirect block,data block, or whatever, in one of the new rgrps added by gfs2_grow.

However, since the inode was created/instantiated (which must be true inorder for it to be evicted), the dinode itself must reside in apreviously instantiated rgrp, and therefore the call togfs2_rindex_update is not needed at all.

So if the call to it fails, imho, it shouldn't fail the rest of thegfs2_dinode_dealloc, regardless of the failure.

The next question you may ask is: why don't we get the -EINTR whenreading in new rgrps for the purposes of deleting other parts of thefile, its eattrs, indirect blocks, data blocks, etc.? The answer is: Idon't know, but I suspect we have other bugs lurking in that area. Isuspect if we try hard enough we can create other problems in which thepunch_hole code doesn't read in new rgrps.

It may be tempting to think that this also cannot happen because thergrps must also be instantiated for any eattrs, metadata, data to beassigned to the dinode being evicted/deleted. But that's non-clusteredfile system thinking.

In gfs2, it is possible for one cluster node to read in new rgrps fromgfs2_grow, then assign those blocks to a new dinode that's already openon a different node, then delete that file, causing a second clusternode to evict, and try to reference those new blocks before the newrgrps are read in. So we need to be very careful.

We should probably spend some time trying to force these conditions tosee if we can flush out more bugs.

For some reason, with this test, we only see this particular problemwith gfs2_dinode_dealloc, and that's the problem I'm trying to fix withthe patch.


Regards,

Bob Peterson

Re: [Cluster-devel] [PATCH] gfs2: ignore rindex_update failure in dinode_dealloc

Reply via email to