On Tuesday, 9 October 2018 14:00:34 BST Andreas Gruenbacher wrote:
> On Tue, 9 Oct 2018 at 14:46, Tim Smith <tim.sm...@citrix.com> wrote:
> > On Tuesday, 9 October 2018 13:34:47 BST you wrote:
> > > On Mon, 8 Oct 2018 at 15:27, Tim Smith <tim.sm...@citrix.com> wrote:
> > > > On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
> > > > > Hi,
> > > > > 
> > > > > On 08/10/18 14:10, Tim Smith wrote:
> > > > > > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
> > > > > >> On 08/10/18 13:59, Mark Syms wrote:
> > > > > >>> That sounds entirely reasonable so long as you are absolutely
> > > > > >>> sure
> > > > > >>> that
> > > > > >>> nothing is ever going to mess with that glock, we erred on the
> > > > > >>> side
> > > > > >>> of
> > > > > >>> more caution not knowing whether it would be guaranteed safe or
> > > > > >>> not.
> > > > > >>> 
> > > > > >>> Thanks,
> > > > > >>> 
> > > > > >>>   Mark
> > > > > >> 
> > > > > >> We should have a look at the history to see how that wait got
> > > > > >> added.
> > > > > >> However the "dead" flag here means "don't touch this glock" and
> > > > > >> is
> > > > > >> there
> > > > > >> so that we can separate the marking dead from the actual removal
> > > > > >> from
> > > > > >> the list (which simplifies the locking during the scanning
> > > > > >> procedures)
> > > > > > 
> > > > > > You beat me to it :-)
> > > > > > 
> > > > > > I think there might be a bit of a problem inserting a new entry
> > > > > > with
> > > > > > the
> > > > > > same name before the old entry has been fully destroyed (or at
> > > > > > least
> > > > > > removed), which would be why the schedule() is there.
> > > > > 
> > > > > If the old entry is marked dead, all future lookups should ignore
> > > > > it. We
> > > > > should only have a single non-dead entry at a time, but that doesn't
> > > > > seem like it should need us to wait for it.
> > > > 
> > > > On the second call we do have the new glock to insert as arg2, so we
> > > > could
> > > > try to swap them cleanly, yeah.
> > > > 
> > > > > If we do discover that the wait is really required, then it sounds
> > > > > like
> > > > > as you mentioned above there is a lost wakeup, and that must
> > > > > presumably
> > > > > be on a code path that sets the dead flag and then fails to send a
> > > > > wake
> > > > > up later on. If we can drop the wait in the first place, that seems
> > > > > like
> > > > > a better plan,
> > > > 
> > > > Ooooh, I wonder if these two lines:
> > > >         wake_up_glock(gl);
> > > >         call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);
> > > > 
> > > > in gfs2_glock_free() are the wrong way round?
> > > 
> > > No, what's important here is that the wake-up happens after the glock
> > > being freed has been removed from the rhashtable, and that's the case
> > > here. wake_up_glock actually accesses the glock, which may no longer
> > > be around after the call_rcu, so swapping the two lines would
> > > introduce a use-after-free bug.
> > 
> > Yes, realised that :-/
> > 
> > > There must be another reason for the missed wake-up. I'll have to
> > > study the code some more.
> > 
> > I think it needs to go into gfs2_glock_dealloc(), in such a way as to
> > avoid
> > that problem. Currently working out a patch to do that.
> 
> That doesn't sound right: find_insert_glock is waiting for the glock
> to be removed from the rhashtable. In gfs2_glock_free, we remove the
> glock from the rhashtable and then we do the wake-up. Delaying the
> wakeup further isn't going to help (but it might hide the problem).

The only way we can get the problem we're seeing is if we get an effective 
order of

T0: wake_up_glock()
T1: prepare_to_wait()
T1: schedule()

so clearly there's a way for that to happen. Any other order and either 
schedule() doesn't sleep or it gets woken.

The only way I can see at the moment to ensure that wake_up_glock() *cannot* 
get called until after prepare_to_wait() is to delay it until the read_side 
critical sections are done, and the first place that's got that property is 
the start of gfs2_glock_dealloc(), unless we want to add synchronise_rcu() to 
gfs2_glock_free() and I'm guessing there's a reason it's using call_rcu() 
instead.

I'll keep thinking about it.

> Instrumenting glock_wake_function might help though.
> 
> Thanks,
> Andreas


-- 
Tim Smith <tim.sm...@citrix.com>


Reply via email to