Hi Edwin,
On Thu, 7 Mar 2019 at 16:55, Edwin Török <[email protected]> wrote:
> On 06/03/2019 16:14, Andreas Gruenbacher wrote:
> > Mark Syms has reported seeing tasks that are stuck waiting in
> > find_insert_glock. It turns out that struct lm_lockname contains four
> > padding
> > bytes on 64-bit architectures that function glock_waitqueue doesn't skip
> > when
> > hashing the glock name. As a result, we can end up waking up the wrong
> > waitqueue, and the waiting tasks will be stuck.
> >
> > Use offsetofend instead of sizeof to get the struct size without the
> > padding.
> >
> > Reported-by: Mark Syms <[email protected]>
> > Signed-off-by: Andreas Gruenbacher <[email protected]>
>
> Thank you for your patch, initial testing looks promising when replacing the
> schedule_timeout patch with this one (no deadlocks/stuck tasks found so far).
> Will let you know when our testing completes.
okay great, thanks. Hope we can still get this into the 5.1 kernel.
> > ---
> > fs/gfs2/glock.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> > index f66773c71bcde..01611f363cd59 100644
> > --- a/fs/gfs2/glock.c
> > +++ b/fs/gfs2/glock.c
> > @@ -107,7 +107,8 @@ static int glock_wake_function(wait_queue_entry_t
> > *wait, unsigned int mode,
> >
> > static wait_queue_head_t *glock_waitqueue(struct lm_lockname *name)
> > {
> > - u32 hash = jhash2((u32 *)name, sizeof(*name) / 4, 0);
> > + unsigned int size = offsetofend(struct lm_lockname, ln_type);
>
> This is the same as ht_parms.key_len, which I assume was the intention.
> Should we just use ht_parms.key_len for size instead of calculating it again
> to avoid the 2 values becoming
> different in the future again?
We can do that, yes.
> > + u32 hash = jhash2((u32 *)name, size / 4, 0);
> >
> > return glock_wait_table + hash_32(hash, GLOCK_WAIT_TABLE_BITS);
> > }
Thanks,
Andreas