On Mon, Jan 25, 2016 at 10:39 AM, Raghavendra Gowdappa <[email protected]> wrote:
> > > ----- Original Message ----- > > From: "Richard Wareing" <[email protected]> > > To: "Pranith Kumar Karampuri" <[email protected]> > > Cc: [email protected] > > Sent: Monday, January 25, 2016 8:17:11 AM > > Subject: Re: [Gluster-devel] Feature: Automagic lock-revocation for > features/locks xlator (v3.7.x) > > > > Yup per domain would be useful, the patch itself currently honors > domains as > > well. So locks in a different domains will not be touched during > revocation. > > > > In our cases we actually prefer to pull the plug on SHD/DHT domains to > ensure > > clients do not hang, this is important for DHT self heals which cannot be > > disabled via any option, we've found in most cases once we reap the lock > > another properly behaving client comes along and completes the DHT heal > > properly. > > Flushing waiting locks of DHT can affect application continuity too. > Though locks requested by rebalance process can be flushed to certain > extent without applications noticing any failures, there is no guarantee > that locks requested in DHT_LAYOUT_HEAL_DOMAIN and DHT_FILE_MIGRATE_DOMAIN, > are issued by only rebalance process. I missed this point in my previous mail. Now I remember that we can use frame->root->pid (being negative) to identify internal processes. Was this the approach you followed to identify locks from rebalance process? > These two domains are used for locks to synchronize among and between > rebalance process(es) and client(s). So, there is equal probability that > these locks might be requests from clients and hence application can see > some file operations failing. > > In case of pulling plug on DHT_LAYOUT_HEAL_DOMAIN, dentry operations that > depend on layout can fail. These operations can include create, link, > unlink, symlink, mknod, mkdir, rename for files/directory within the > directory on which lock request is failed. > > In case of pulling plug on DHT_FILE_MIGRATE_DOMAIN, rename of immediate > subdirectories/files can fail. > > > > > > Richard > > > > > > Sent from my iPhone > > > > On Jan 24, 2016, at 6:42 PM, Pranith Kumar Karampuri < > [email protected] > > > wrote: > > > > > > > > > > > > > > On 01/25/2016 02:17 AM, Richard Wareing wrote: > > > > > > > > Hello all, > > > > Just gave a talk at SCaLE 14x today and I mentioned our new locks > revocation > > feature which has had a significant impact on our GFS cluster > reliability. > > As such I wanted to share the patch with the community, so here's the > > bugzilla report: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1301401 > > > > ===== > > Summary: > > Mis-behaving brick clients (gNFSd, FUSE, gfAPI) can cause cluster > instability > > and eventual complete unavailability due to failures in releasing > > entry/inode locks in a timely manner. > > > > Classic symptoms on this are increased brick (and/or gNFSd) memory usage > due > > the high number of (lock request) frames piling up in the processes. The > > failure-mode results in bricks eventually slowing down to a crawl due to > > swapping, or OOMing due to complete memory exhaustion; during this period > > the entire cluster can begin to fail. End-users will experience this as > > hangs on the filesystem, first in a specific region of the file-system > and > > ultimately the entire filesystem as the offending brick begins to turn > into > > a zombie (i.e. not quite dead, but not quite alive either). > > > > Currently, these situations must be handled by an administrator > detecting & > > intervening via the "clear-locks" CLI command. Unfortunately this doesn't > > scale for large numbers of clusters, and it depends on the correct > > (external) detection of the locks piling up (for which there is little > > signal other than state dumps). > > > > This patch introduces two features to remedy this situation: > > > > 1. Monkey-unlocking - This is a feature targeted at developers (only!) to > > help track down crashes due to stale locks, and prove the utility of he > lock > > revocation feature. It does this by silently dropping 1% of unlock > requests; > > simulating bugs or mis-behaving clients. > > > > The feature is activated via: > > features.locks-monkey-unlocking <on/off> > > > > You'll see the message > > "[<timestamp>] W [inodelk.c:653:pl_inode_setlk] 0-groot-locks: MONKEY > LOCKING > > (forcing stuck lock)!" ... in the logs indicating a request has been > > dropped. > > > > 2. Lock revocation - Once enabled, this feature will revoke a > *contended*lock > > (i.e. if nobody else asks for the lock, we will not revoke it) either by > the > > amount of time the lock has been held, how many other lock requests are > > waiting on the lock to be freed, or some combination of both. Clients > which > > are losing their locks will be notified by receiving EAGAIN (send back to > > their callback function). > > > > The feature is activated via these options: > > features.locks-revocation-secs <integer; 0 to disable> > > features.locks-revocation-clear-all [on/off] > > features.locks-revocation-max-blocked <integer> > > > > Recommended settings are: 1800 seconds for a time based timeout (give > clients > > the benefit of the doubt, or chose a max-blocked requires some > > experimentation depending on your workload, but generally values of > hundreds > > to low thousands (it's normal for many ten's of locks to be taken out > when > > files are being written @ high throughput). > > > > I really like this feature. One question though, self-heal, rebalance > domain > > locks are active until self-heal/rebalance is complete which can take > more > > than 30 minutes if the files are in TBs. I will try to see what we can > do to > > handle these without increasing the revocation-secs too much. May be we > can > > come up with per domain revocation timeouts. Comments are welcome. > > > > Pranith > > > > > > > > > > ===== > > > > The patch supplied will patch clean the the v3.7.6 release tag, and > probably > > to any 3.7.x release & master (posix locks xlator is rarely touched). > > > > Richard > > > > > > > > > > > > _______________________________________________ > > Gluster-devel mailing list [email protected] > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > > > _______________________________________________ > > Gluster-devel mailing list > > [email protected] > > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ > Gluster-devel mailing list > [email protected] > http://www.gluster.org/mailman/listinfo/gluster-devel > -- Raghavendra G
_______________________________________________ Gluster-devel mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-devel
