Re: [Cluster-devel] [GFS2 PATCH] GFS2: Instruct DLM to avoid queue convert slowdowns

Steven Whitehouse Tue, 10 Apr 2012 07:50:16 -0700

Hi,

On Tue, 2012-04-10 at 10:01 -0400, Bob Peterson wrote:
> ----- Original Message -----
> | Hi,
> | 
> | On Thu, 2012-04-05 at 12:11 -0400, Bob Peterson wrote:
> | > Hi,
> | > 
> | > Here's another patch (explanation below). This patch replies upon
> | > a DLM patch that hasn't fully gone upstream yet, so perhaps it
> | > shouldn't be added to the nmw tree until it is. This greatly
> | > improves the performance of gfs2_grow in a clustered gfs2 file
> | > system.
> | > 
> | > Regards,
> | > 
> | I'm still not very keen on dragging in this bit of dlm. If it is
> | really
> | needed, then we should use the copy in the dlm itself and not add our
> | own copy of it. Also, why can we not use the GLF_BLOCKING flag for
> | this
> | purpose? Is there some issue with try locks in that case?
> 
> To answer your questions:
> (1) Yes, this bit is really needed.
> (2) We cannot use the actual dlm flags because they're part of
>     dlm/lock.c and not in a header. I can request they be moved
>     to a header file, but that's at the whim of upstream dlm.
Indeed, but we don't normally duplicate code like this, and in this case
I'm suggesting that it is not required anyway.


> (3) We can't use GLF_BLOCKING because it's a different issue.
>     The issue is this: Unless instructed otherwise, dlm will
>     grant GFS2 (or any caller) locks as it sees fit, and when there
>     is a hotly contested file (such as rindex, once it's been
>     updated by gfs2_grow and needed immediately by all nodes in
>     a cluster) dlm often finds it more convenient to grant the lock
>     to a process on the current owner node rather than bouncing the
>     lock to a different node.
> 
>     That can result in one node "hogging" the lock for long periods of
>     time. That means a gfs2_grow can take 30 minutes to complete
>     rather than 0.3 seconds. To use a farming analogy: your animals
>     are starving because the hogs at the trough are eagerly fighting
>     over the morsel at the bottom, and not giving the farmer enough
>     of a break to put their food in. To avoid this fighting behavior
>     dlm has the DLM_LKF_QUECVT bit which basically says lock upgrades
>     (the farmer) over the convenience of keeping the lock on the same
>     node (the hogs). Employing this bit does wonders for gfs2_grow.
>     However, we can only use the bit on dlm lock upgrades. If we
>     use it in other cases, dlm goes through an error path that
>     results in a hang.
> 
I still don't understand why the GLF_BLOCKING flag is not the correct
condition here - see my reply to Dave.

> | When you say that this relies upon this dlm patch, what does that
> | mean?
> 
> (4) It means that there's a DLM bug whereby using the DLM_LKF_QUECVT
>     bit causes DLM (and therefore GFS2) to hang in certain situations.
>     There is a patch to fix this bad DLM behavior but it hasn't gone
>     upstream yet.
> 
> | What are the consequences of having this patch but not the dlm one?
> 
> (5) The consequences of having this patch but not the DLM one are
>     disastrous: gfs2 is likely to hang.
> 
> | I'm
> | wondering whether I should hold off on this at least until the dlm
> | one
> | has been finalised and applied.
> 
> (6) Yes, that's what we need to do.
> 
Ok, sounds good to me,

Steve.

> | Aside from those points though, this is a really big step forward and
> | should make a measurable difference to performance across the board
> | when
> | contention between multiple nodes occurs. So I'm very happy to see
> | such
> | significant progress in this area,
> | 
> | Steve.
> | 
> | 
> | > Bob Peterson
> | > Red Hat File Systems
> | > 
> | > Signed-off-by: Bob Peterson <rpete...@redhat.com>
> | > ---
> | > Author: Bob Peterson <rpete...@redhat.com>
> | > Date:   Wed Apr 4 15:14:51 2012 -0500
> | > 
> | >     GFS2: Instruct DLM to avoid queue convert slowdowns
> | >     
> | >     This patch instructs DLM to prevent an "in place" conversion,
> | >     where the
> | >     lock just stays on the granted queue, and instead forces the
> | >     conversion to
> | >     the back of the convert queue. This is done on upward
> | >     conversions only.
> | >     
> | >     This is useful in cases where, for example, a lock is
> | >     frequently needed in
> | >     PR on one node, but another node needs it temporarily in EX to
> | >     update it.
> | >     This may happen, for example, when the rindex is being updated
> | >     by gfs2_grow.
> | >     The gfs2_grow needs to have the lock in EX, but the other nodes
> | >     need to
> | >     re-read it to retrieve the updates. The glock is already
> | >     granted in PR on
> | >     the non-growing nodes, so this prevents them from continually
> | >     re-granting
> | >     the lock in PR, and forces the EX from gfs2_grow to go through.
> | > 
> | > diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
> | > index bd5af03..43466d6 100644
> | > --- a/fs/gfs2/lock_dlm.c
> | > +++ b/fs/gfs2/lock_dlm.c
> | > @@ -16,6 +16,25 @@
> | >  #include "glock.h"
> | >  #include "util.h"
> | >  
> | > +/*
> | > + * Borrowed from dlm's lock.c: We need to only pass the QUECVT bit
> | > with
> | > + * requests that are compatible, so we use dlm's check:
> | > + *
> | > + * Compatibility matrix for conversions with QUECVT set.
> | > + * Granted mode is the row; requested mode is the column.
> | > + * Usage: matrix[grmode+1][rqmode+1]
> | > + */
> | > +static const int quecvt_compat_matrix[8][8] = {
> | > +      /* UN NL CR CW PR PW EX PD */
> | > +        {0, 0, 0, 0, 0, 0, 0, 0},       /* UN */
> | > +        {0, 0, 1, 1, 1, 1, 1, 0},       /* NL */
> | > +        {0, 0, 0, 1, 1, 1, 1, 0},       /* CR */
> | > +        {0, 0, 0, 0, 1, 1, 1, 0},       /* CW */
> | > +        {0, 0, 0, 1, 0, 1, 1, 0},       /* PR */
> | > +        {0, 0, 0, 0, 0, 0, 1, 0},       /* PW */
> | > +        {0, 0, 0, 0, 0, 0, 0, 0},       /* EX */
> | > +        {0, 0, 0, 0, 0, 0, 0, 0}        /* PD */
> | > +};
> | >  
> | >  static void gdlm_ast(void *arg)
> | >  {
> | > @@ -104,10 +123,13 @@ static int make_mode(const unsigned int
> | > lmstate)
> | >   return -1;
> | >  }
> | >  
> | > -static u32 make_flags(const u32 lkid, const unsigned int
> | > gfs_flags,
> | > +static u32 make_flags(struct gfs2_glock *gl, const unsigned int
> | > gfs_flags,
> | >                 const int req)
> | >  {
> | >   u32 lkf = 0;
> | > + u32 lkid = gl->gl_lksb.sb_lkid;
> | > + int is_upconvert =
> | > +         quecvt_compat_matrix[make_mode(gl->gl_state) + 1][req + 1];
> | >  
> | >   if (gfs_flags & LM_FLAG_TRY)
> | >           lkf |= DLM_LKF_NOQUEUE;
> | > @@ -131,8 +153,11 @@ static u32 make_flags(const u32 lkid, const
> | > unsigned int gfs_flags,
> | >                   BUG();
> | >   }
> | >  
> | > - if (lkid != 0)
> | > + if (lkid != 0) {
> | >           lkf |= DLM_LKF_CONVERT;
> | > +         if (is_upconvert)
> | > +                 lkf |= DLM_LKF_QUECVT;
> | > + }
> | >  
> | >   lkf |= DLM_LKF_VALBLK;
> | >  
> | > @@ -149,7 +174,7 @@ static unsigned int gdlm_lock(struct gfs2_glock
> | > *gl,
> | >  
> | >   gl->gl_req = req_state;
> | >   req = make_mode(req_state);
> | > - lkf = make_flags(gl->gl_lksb.sb_lkid, flags, req);
> | > + lkf = make_flags(gl, flags, req);
> | >  
> | >   /*
> | >    * Submit the actual lock request.
> | > 
> | 
> | 
> |

Re: [Cluster-devel] [GFS2 PATCH] GFS2: Instruct DLM to avoid queue convert slowdowns

Reply via email to