----- Original Message ----- | Hi, | | On Thu, 2012-04-05 at 12:11 -0400, Bob Peterson wrote: | > Hi, | > | > Here's another patch (explanation below). This patch replies upon | > a DLM patch that hasn't fully gone upstream yet, so perhaps it | > shouldn't be added to the nmw tree until it is. This greatly | > improves the performance of gfs2_grow in a clustered gfs2 file | > system. | > | > Regards, | > | I'm still not very keen on dragging in this bit of dlm. If it is | really | needed, then we should use the copy in the dlm itself and not add our | own copy of it. Also, why can we not use the GLF_BLOCKING flag for | this | purpose? Is there some issue with try locks in that case?
To answer your questions: (1) Yes, this bit is really needed. (2) We cannot use the actual dlm flags because they're part of dlm/lock.c and not in a header. I can request they be moved to a header file, but that's at the whim of upstream dlm. (3) We can't use GLF_BLOCKING because it's a different issue. The issue is this: Unless instructed otherwise, dlm will grant GFS2 (or any caller) locks as it sees fit, and when there is a hotly contested file (such as rindex, once it's been updated by gfs2_grow and needed immediately by all nodes in a cluster) dlm often finds it more convenient to grant the lock to a process on the current owner node rather than bouncing the lock to a different node. That can result in one node "hogging" the lock for long periods of time. That means a gfs2_grow can take 30 minutes to complete rather than 0.3 seconds. To use a farming analogy: your animals are starving because the hogs at the trough are eagerly fighting over the morsel at the bottom, and not giving the farmer enough of a break to put their food in. To avoid this fighting behavior dlm has the DLM_LKF_QUECVT bit which basically says lock upgrades (the farmer) over the convenience of keeping the lock on the same node (the hogs). Employing this bit does wonders for gfs2_grow. However, we can only use the bit on dlm lock upgrades. If we use it in other cases, dlm goes through an error path that results in a hang. | When you say that this relies upon this dlm patch, what does that | mean? (4) It means that there's a DLM bug whereby using the DLM_LKF_QUECVT bit causes DLM (and therefore GFS2) to hang in certain situations. There is a patch to fix this bad DLM behavior but it hasn't gone upstream yet. | What are the consequences of having this patch but not the dlm one? (5) The consequences of having this patch but not the DLM one are disastrous: gfs2 is likely to hang. | I'm | wondering whether I should hold off on this at least until the dlm | one | has been finalised and applied. (6) Yes, that's what we need to do. | Aside from those points though, this is a really big step forward and | should make a measurable difference to performance across the board | when | contention between multiple nodes occurs. So I'm very happy to see | such | significant progress in this area, | | Steve. | | | > Bob Peterson | > Red Hat File Systems | > | > Signed-off-by: Bob Peterson <rpete...@redhat.com> | > --- | > Author: Bob Peterson <rpete...@redhat.com> | > Date: Wed Apr 4 15:14:51 2012 -0500 | > | > GFS2: Instruct DLM to avoid queue convert slowdowns | > | > This patch instructs DLM to prevent an "in place" conversion, | > where the | > lock just stays on the granted queue, and instead forces the | > conversion to | > the back of the convert queue. This is done on upward | > conversions only. | > | > This is useful in cases where, for example, a lock is | > frequently needed in | > PR on one node, but another node needs it temporarily in EX to | > update it. | > This may happen, for example, when the rindex is being updated | > by gfs2_grow. | > The gfs2_grow needs to have the lock in EX, but the other nodes | > need to | > re-read it to retrieve the updates. The glock is already | > granted in PR on | > the non-growing nodes, so this prevents them from continually | > re-granting | > the lock in PR, and forces the EX from gfs2_grow to go through. | > | > diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c | > index bd5af03..43466d6 100644 | > --- a/fs/gfs2/lock_dlm.c | > +++ b/fs/gfs2/lock_dlm.c | > @@ -16,6 +16,25 @@ | > #include "glock.h" | > #include "util.h" | > | > +/* | > + * Borrowed from dlm's lock.c: We need to only pass the QUECVT bit | > with | > + * requests that are compatible, so we use dlm's check: | > + * | > + * Compatibility matrix for conversions with QUECVT set. | > + * Granted mode is the row; requested mode is the column. | > + * Usage: matrix[grmode+1][rqmode+1] | > + */ | > +static const int quecvt_compat_matrix[8][8] = { | > + /* UN NL CR CW PR PW EX PD */ | > + {0, 0, 0, 0, 0, 0, 0, 0}, /* UN */ | > + {0, 0, 1, 1, 1, 1, 1, 0}, /* NL */ | > + {0, 0, 0, 1, 1, 1, 1, 0}, /* CR */ | > + {0, 0, 0, 0, 1, 1, 1, 0}, /* CW */ | > + {0, 0, 0, 1, 0, 1, 1, 0}, /* PR */ | > + {0, 0, 0, 0, 0, 0, 1, 0}, /* PW */ | > + {0, 0, 0, 0, 0, 0, 0, 0}, /* EX */ | > + {0, 0, 0, 0, 0, 0, 0, 0} /* PD */ | > +}; | > | > static void gdlm_ast(void *arg) | > { | > @@ -104,10 +123,13 @@ static int make_mode(const unsigned int | > lmstate) | > return -1; | > } | > | > -static u32 make_flags(const u32 lkid, const unsigned int | > gfs_flags, | > +static u32 make_flags(struct gfs2_glock *gl, const unsigned int | > gfs_flags, | > const int req) | > { | > u32 lkf = 0; | > + u32 lkid = gl->gl_lksb.sb_lkid; | > + int is_upconvert = | > + quecvt_compat_matrix[make_mode(gl->gl_state) + 1][req + 1]; | > | > if (gfs_flags & LM_FLAG_TRY) | > lkf |= DLM_LKF_NOQUEUE; | > @@ -131,8 +153,11 @@ static u32 make_flags(const u32 lkid, const | > unsigned int gfs_flags, | > BUG(); | > } | > | > - if (lkid != 0) | > + if (lkid != 0) { | > lkf |= DLM_LKF_CONVERT; | > + if (is_upconvert) | > + lkf |= DLM_LKF_QUECVT; | > + } | > | > lkf |= DLM_LKF_VALBLK; | > | > @@ -149,7 +174,7 @@ static unsigned int gdlm_lock(struct gfs2_glock | > *gl, | > | > gl->gl_req = req_state; | > req = make_mode(req_state); | > - lkf = make_flags(gl->gl_lksb.sb_lkid, flags, req); | > + lkf = make_flags(gl, flags, req); | > | > /* | > * Submit the actual lock request. | > | | |