Hi,

On Wed, Feb 16, 2022 at 7:37 PM Andreas Gruenbacher <agrue...@redhat.com> wrote:
>
> On Wed, Feb 16, 2022 at 5:16 PM Alexander Aring <aahri...@redhat.com> wrote:
> >
> > Hi,
> >
> > On Wed, Feb 16, 2022 at 11:08 AM Andreas Gruenbacher
> > <agrue...@redhat.com> wrote:
> > >
> > > On Wed, Feb 16, 2022 at 4:53 PM Alexander Aring <aahri...@redhat.com> 
> > > wrote:
> > > > There are several sanity checks and recover handling if they occur in
> > > > the dlm plock handling. They should never occur otherwise we have a bug
> > > > in the code. To make such bugs more visible we remove the recover
> > > > handling and add a WARN_ON() on those sanity checks.
> > > >
> > > > Signed-off-by: Alexander Aring <aahri...@redhat.com>
> > > > ---
> > > >  fs/dlm/plock.c | 32 ++++----------------------------
> > > >  1 file changed, 4 insertions(+), 28 deletions(-)
> > > >
> > > > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> > > > index a10d2bcfe75a..55fba2f0234f 100644
> > > > --- a/fs/dlm/plock.c
> > > > +++ b/fs/dlm/plock.c
> > > > @@ -157,13 +157,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 
> > > > number, struct file *file,
> > > >                 goto out;
> > > >         }
> > > >
> > > > -       spin_lock(&ops_lock);
> > > > -       if (!list_empty(&op->list)) {
> > > > -               log_error(ls, "dlm_posix_lock: op on list %llx",
> > > > -                         (unsigned long long)number);
> > > > -               list_del(&op->list);
> > > > -       }
> > > > -       spin_unlock(&ops_lock);
> > > > +       WARN_ON(!list_empty(&op->list));
> > >
> > > Why don't those checks need the ops_lock spin lock anymore?
> > > Why does it make sense to get rid of the list_del calls?
> >
> > My understanding is that those list_del() calls try to recover
> > something if a sanity check hits. The list_emptry() check should
> > always be true at this point no matter if lock is held or not.
> > Therefore no lock is required here to do some sanity checking.
>
> I don't immediately see what, other than the spin lock, would
> guarantee a consistent memory view. In other words, without taking the
> spin lock, 'list_empty(&op->list)' might still be true on one CPU even
> though another CPU has already added 'op' to a list. So please, when
> changing the locking somewhere, explain why the change is correct
> beyond just stating that the locking isn't needed.
>

Three WARN_ON()'s are behind wait_event()/wait_event_interruptible()
assuming they have the right barriers here to provide a consistent
memory view.

The other one in dlm_plock_callback() was the one which makes me think
about what the code is trying to do here. My guess is that the whole
locking concept is somehow connected, that some posix lock api/misc
char devices calls cannot be called parallel and they occur in a
specific sequence (maybe this sequence is only guaranteed by
dlm_controld?).
However the dlm_plock_callback() just did a
"list_del_init(&op->list);" before the WARN_ON() and I guess nothing
should interfere with this by another list manipulation otherwise the
previous sanity check would hit.

I would remove those checks but I don't trust that the code is
assuming that some things cannot run parallel (as the API calls it)
and in a specific sequence, otherwise there would be multiple
use-after-free. My assumption here is that those sanity checks somehow
try to confirm that.

Does this justify(if this explanation is correct)/explain the patch enough?

- Alex

Reply via email to