On Wed, Dec 02, 2015 at 11:42:13AM -0500, Bob Peterson wrote: > ----- Original Message ----- > (snip) > > Please take a look at this > > again and figure out what the problematic cycle of events is, and then > > work out how to avoid that happening in the first place. There is no > > point in replacing one problem with another one, particularly one which > > would likely be very tricky to debug, > > > > Steve. > > Rhe problematic cycle of events is well known: > gfs2_clear_inode calls gfs2_glock_put() for the inode's glock, > but if it's the very last put, it calls into dlm, which can block, > and that's where we get into trouble. > > The livelock goes like this: > > 1. A fence operation needs memory, so it blocks on memory allocation. > 2. Memory allocation blocks on slab shrinker. > 3. Slab shrinker calls into vfs inode shrinker to free inodes from memory. .... > 7. dlm blocks on a pending fence operation. Goto 1.
Therefore, the fence operation should be doing GFP_NOFS allocations to prevent re-entry into the DLM via the filesystem via the shrinker.... Cheers, Dave. -- Dave Chinner dchin...@redhat.com