Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

Bob Peterson Mon, 08 Oct 2018 06:25:56 -0700

----- Original Message -----
> 
> 
> On 08/10/18 13:59, Mark Syms wrote:
> > That sounds entirely reasonable so long as you are absolutely sure that
> > nothing is ever going to mess with that glock, we erred on the side of
> > more caution not knowing whether it would be guaranteed safe or not.
> >
> > Thanks,
> >
> >     Mark
> We should have a look at the history to see how that wait got added.
> However the "dead" flag here means "don't touch this glock" and is there
> so that we can separate the marking dead from the actual removal from
> the list (which simplifies the locking during the scanning procedures)
> 
> Steve.
(snip)
> > That is a bit odd. In fact that whole function looks odd. I wonder why it
> > needs to wait in the first place. It should be a simple comparison here.
> > If the glock is dead then nothing else should touch it, so we are safe to
> > add a new one into the hash table. The wait is almost certainly a bug,
> >
> > Steve.


Hi,

Andreas and I both did a ton of work here trying to get this right, and it
was all done for the problems we had with transitioning dinodes from
unlinked to free, and how the glocks beneath them were competing by way
of the in-core inodes. The glocks typically outlive the inodes, but we
had tons of problems with inodes coming and going, their associated glocks
coming and going, and being marked dead, and the rcus underneath them.

The problem is that inodes are coming and going, some with I_FREE, or
I_WILL_FREE. Glocks are also coming and going, often for the same block
at the same time, but two different inodes for two different files, and
both inode and iopen glocks.

The problems stem from remote unlinks causing "delete work" while the
inodes and glocks are both being allocated and freed in rapid succession.

We encountered lots of hangs and use-after-free problems. My  point is not
that we shouldn't fix it, but merely that we need to be VERY careful here
not to reintroduce one of the countless problems we fixed. Mark's original
patch seems pretty low risk to me, but if we go that route, I'd like to
see a smaller timeout; 1HZ seems like a very long time. Better still to
see if there's a better fix that doesn't break anything.

Bob Peterson

Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

Reply via email to