----- Original Message ----- > Add a new rg_skip field to struct gfs2_rgrp, replacing __pad. The > rg_skip field has the following meaning: > > - If rg_skip is zero, it is considered unset and not useful. > - If rg_skip is non-zero, its value will be the number of blocks between > this rgrp's address and the next rgrp's address. This can be used as a > hint by fsck.gfs2 when rebuilding a bad rindex, for example. > > When gfs2_rgrp_bh_get() reads a resource group header and finds rg_skip > to be 0 it will attempt to set it to the difference between its rd_addr > and the rd_addr of the next resource group. > > The only special case is the final rgrp, which always has a rg_skip of > 0. It is not set to a special value (like -1) because, when the > filesystem is grown, the rgrp will no longer be the final one and it > will then need to have its rg_skip field set. The overhead of this > special case is a gfs2_rgrpd_get_next() call each time > gfs2_rgrp_bh_get() is called for the final resource group. > > For the other resource groups, if the rg_skip field is 0, it is set > appropriately and then the only overhead becomes the rgd->rg_skip == 0 > comparison in gfs2_rgrp_bh_get(). > > Before this patch, gfs2_rgrp_out() zeroes the __pad field explicitly, so > the rg_skip field can get set back to 0 in cases where nodes with and > without this patch are mixed in a cluster. In some cases, the field may > bounce between being set by one node and then zeroed by another which > may harm performance slightly, e.g. when two nodes create many small > files. In testing this situation is rare but it becomes more likely as > the filesystem fills up and there are fewer resource groups to choose > from. The problem goes away when all nodes are running with this patch. > Dipping into the space currently occupied by the rg_reserved field would > have resulted in the same problem as it is also explicitly zeroed, so > unfortunately there is no other way around it. > > Signed-off-by: Andrew Price <anpr...@redhat.com>
Hi Andy, I've been talking about doing something like this for years, so it's good to see someone finally acting on it. Although this is a good first stab at the solution, my main concern about this implementation is that, AFAICT, it doesn't take read-only mounts into account. In fact, a "spectator" mount might even cause it to BUG_ON from gfs2_trans_begin, since there's no journal. But it's close. Regards, Bob Peterson Red Hat File Systems