On Sat, 19 Jul 2014 16:20:01 -0400 Milosz Tanski <[email protected]> wrote:

> Neil,
> 
> I saw your recent patcheset for improving the wait_on_bit interface
> (particular: SCHED: allow wait_on_bit_action functions to support a
> timeout.) I'm looking on some guidance on leveraging that work to
> solve other recursive lock hang in fscache.
> 
> I've ran into similar issues you're trying to solve with loopback NFS
> but in the fscache code. This happens under heavy vma preasure when
> the kernel is aggressively trying to trim the page cache.
> 
> The hang is caused by this serious of events
> 1. cachefiles_write_page - cachefiles (the fscache backend, sitting on
> ext4) tries to write page to disk
> 2. ext4 tries to allocate a page in writeback (without GPF_NOFS and
> with wait flag)
> 3. due to vma preasure the kernel tries to free-up pages
> 4. this causes release pages in ceph to be called
> 5. the selected page is cached page in process of write out (from step #1)
> 6. fscache_wait_on_page_write hangs forever
> 
> Is there a solution that you have to NFS as another patch that
> implements the timeout that I can use a template? I'm not familiar
> with that piece of the code base.

It looks like the comment in  __fscache_maybe_release_page

        /* We will wait here if we're allowed to, but that could deadlock the
         * allocator as the work threads writing to the cache may all end up
         * sleeping on memory allocation, so we may need to impose a timeout
         * too. */

is correct when it says "we may need to impose a timeout".
The following __fscache_wait_on_page_write() needs to timeout.

However that doesn't use wait_on_bit(), it just has a simple wait_event.
So something like this should fix it (or should at least move the problem
along a bit).

NeilBrown



diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index ed70714503fa..58035024c5cf 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -43,6 +43,13 @@ void __fscache_wait_on_page_write(struct fscache_cookie 
*cookie, struct page *pa
 }
 EXPORT_SYMBOL(__fscache_wait_on_page_write);
 
+void __fscache_wait_on_page_write_timeout(struct fscache_cookie *cookie, 
struct page *page, unsigned long timeout)
+{
+       wait_queue_head_t *wq = bit_waitqueue(&cookie->flags, 0);
+
+       wait_event_timeout(*wq, !__fscache_check_page_write(cookie, page), 
timeout);
+}
+
 /*
  * decide whether a page can be released, possibly by cancelling a store to it
  * - we're allowed to sleep if __GFP_WAIT is flagged
@@ -115,7 +122,7 @@ page_busy:
        }
 
        fscache_stat(&fscache_n_store_vmscan_wait);
-       __fscache_wait_on_page_write(cookie, page);
+       __fscache_wait_on_page_write_timeout(cookie, page, HZ);
        gfp &= ~__GFP_WAIT;
        goto try_again;
 }



Attachment: signature.asc
Description: PGP signature

Reply via email to