Hi,

On Mon, 2012-01-09 at 11:46 -0500, David Teigland wrote:
> On Mon, Jan 09, 2012 at 04:36:30PM +0000, Steven Whitehouse wrote:
> > On Thu, 2012-01-05 at 10:46 -0600, David Teigland wrote:
> > > This new method of managing recovery is an alternative to
> > > the previous approach of using the userland gfs_controld.
> > > 
> > > - use dlm slot numbers to assign journal id's
> > > - use dlm recovery callbacks to initiate journal recovery
> > > - use a dlm lock to determine the first node to mount fs
> > > - use a dlm lock to track journals that need recovery
> > 
> > I've just been looking at this again, and a question springs to mind...
> > how does this deal with nodes which are read-only or spectator mounts?
> > In the old system we used to propagate that information to gfs_controld
> > but I've not spotted anything similar in the patch so far, so I'm
> > wondering whether it needs to know that information or not,
> 
> The dlm allocates a "slot" for all lockspace members, so spectator mounts
> (like readonly mounts) would be given a slot/jid.  In gfs_controld,
> spectator mounts are not be given a jid (that came from the time when
> adding a journal required extending the device+fs.)  These days, there's
> probably no meaningful difference between spectator and readonly mounts.
> 

The issue is more related to recovery time, though. For spectator
mounts, we don't have to care about recovery at all, and if one fails,
it does not need to be recovered at the fs level. Spectator mounts can
check the journals, but cannot recover any, so as the first mounter of
the filesystem, they must fail to mount if any journals are left dirty.

For read only, it is less clear... the first read only mounter of the fs
must recover all journals. After that, currently, read-only nodes do not
perform recovery, although we could change that - it isn't clear what
the correct path is here, so we need to pick one and stick with it. If a
read-only node fails, we do not need to recover it, since there is
nothing to do (as per spectator).

What I want to avoid is having a cluster of read-only mounted nodes,
have one fail, and then the rest of the cluster is stuck because its
trying to recover the journal for the failed node and there are no nodes
which are able to perform that recovery left in the cluster.

If assigning a slot to spectator mounts means that spectator mounts now
have (effectively) a journal id assigned to them, even if they never
touch it, then that is a change we need to be careful to document in
case someone has a small filesystem with a number of spectator mounters
sharing it, and is thus unable to create more journals when they upgrade
to the new system.

Steve.


Reply via email to