----- Original Message -----
> Add a new rg_skip field to struct gfs2_rgrp, replacing __pad. The
> rg_skip field has the following meaning:
> 
> - If rg_skip is zero, it is considered unset and not useful.
> - If rg_skip is non-zero, its value will be the number of blocks between
>   this rgrp's address and the next rgrp's address. This can be used as a
>   hint by fsck.gfs2 when rebuilding a bad rindex, for example.
> 
> When gfs2_rgrp_bh_get() reads a resource group header and finds rg_skip
> to be 0 it will attempt to set it to the difference between its rd_addr
> and the rd_addr of the next resource group.
> 
> The only special case is the final rgrp, which always has a rg_skip of
> 0. It is not set to a special value (like -1) because, when the
> filesystem is grown, the rgrp will no longer be the final one and it
> will then need to have its rg_skip field set. The overhead of this
> special case is a gfs2_rgrpd_get_next() call each time
> gfs2_rgrp_bh_get() is called for the final resource group.
> 
> For the other resource groups, if the rg_skip field is 0, it is set
> appropriately and then the only overhead becomes the rgd->rg_skip == 0
> comparison in gfs2_rgrp_bh_get().
> 
> Before this patch, gfs2_rgrp_out() zeroes the __pad field explicitly, so
> the rg_skip field can get set back to 0 in cases where nodes with and
> without this patch are mixed in a cluster. In some cases, the field may
> bounce between being set by one node and then zeroed by another which
> may harm performance slightly, e.g. when two nodes create many small
> files. In testing this situation is rare but it becomes more likely as
> the filesystem fills up and there are fewer resource groups to choose
> from. The problem goes away when all nodes are running with this patch.
> Dipping into the space currently occupied by the rg_reserved field would
> have resulted in the same problem as it is also explicitly zeroed, so
> unfortunately there is no other way around it.
> 
> Signed-off-by: Andrew Price <anpr...@redhat.com>

Hi Andy,

I've been talking about doing something like this for years, so it's
good to see someone finally acting on it.

Although this is a good first stab at the solution, my main concern about
this implementation is that, AFAICT, it doesn't take read-only mounts into
account. In fact, a "spectator" mount might even cause it to BUG_ON from
gfs2_trans_begin, since there's no journal. But it's close.

Regards,

Bob Peterson
Red Hat File Systems

Reply via email to