----- Original Message -----
> On Fri, Dec 04, 2015 at 09:51:53AM -0500, Bob Peterson wrote:
> > it's from the fenced process, and if so, queue the final put. That should
> > mitigate the problem.
> 
> Bob, I'm perplexed by the focus on fencing; this issue is broader than
> fencing as I mentioned in bz 1255872.  Over the years that I've reported
> these issues, rarely if ever have they involving fencing.  Any userland
> process, not just the fencing process, can allocate memory, fall into the
> general shrinking path, get into gfs2 and dlm, and end up blocked for some
> undefined time.  That can cause problems in any number of ways.
> 
> The specific problem you're focused on may be one of the easier ways of
> demonstrating the problem -- making the original userland process one of
> the cluster-related processes that gfs2/dlm depend on, combined with
> recovery when those processes do an especially large amount of work that
> gfs2/dlm require.  But problems could occur if any process is forced to
> unwittingly do this dlm work, not just a cluster-related process, and it
> would not need to involve recovery (or fencing which is one small part of
> it).
> 
> I believe in gfs1 and the original gfs2, gfs had its own mechanism/threads
> for shrinking its cache and doing the dlm work, and would not do anything
> from the generic shrinking paths because of this issue.  I don't think
> it's reasonable to expect random, unsuspecting processes on the system to
> perform gfs2/dlm operations that are often remote, lengthy, indefinite, or
> unpredictable.  I think gfs2 needs to do that kind of heavy lifting from
> its own dedicated contexts, or from processes that are explicitly choosing
> to use gfs2.
> 
> 
Hi Dave,

Thanks for your input.
You're right, of course. The problem can occur to other processes that cause
the shrinker to run, not just from fenced.

Yes, the original GFS2 did not have this problem. When gfs2_clear_inode
was called, it called gfs2_glock_schedule_for_reclaim, which queued the inode
glock to a special queue which was handled by the gfs2_glockd daemon. The
reference count was bumped to prevent the gfs2_clear_inode() from being the
last guy out.

The problem with that approach is that it's a centralized list, which means
there's no parallelism and there can get to be a backlog of glocks waiting
to be reclaimed. If that glock needs to be reacquired because the block was
reused for a different inode, we could end up reaping glocks that are still
being (re)used.

We could still do that, but either we reintroduce the gfs2_glockd daemon,
or give our existing daemon, gfs2_quotad, more responsibilities, which would
make it a bit uglier and more complex than it is today.

My previous attempts to solve this involved using a work queue and
queuing the final gfs2_glock_put() and that fixed the problem for all
cases, unless there was already work queued.

When you think about it, using delayed work accomplishes the same thing, but
with the parallelism. When it works. Perhaps I just need to focus on a way to
allow work to be queued multiple times (in an ideal case) or, alternatively,
an atomic counter that corresponds to the number of times the work should be
executed. Or something similar.

Regards,

Bob Peterson
Red Hat File Systems

Reply via email to