On Thu, Feb 14, 2013 at 12:39:26PM -0800, Andrew Morton wrote:
> On Thu, 14 Feb 2013 12:03:49 +0000
> Mel Gorman <mgor...@suse.de> wrote:
> 
> > Rob van der Heij reported the following (paraphrased) on private mail.
> > 
> >     The scenario is that I want to avoid backups to fill up the page
> >     cache and purge stuff that is more likely to be used again (this is
> >     with s390x Linux on z/VM, so I don't give it as much memory that
> >     we don't care anymore). So I have something with LD_PRELOAD that
> >     intercepts the close() call (from tar, in this case) and issues
> >     a posix_fadvise() just before closing the file.
> > 
> >     This mostly works, except for small files (less than 14 pages)
> >     that remains in page cache after the face.
> 
> Sigh.  We've had the "my backups swamp pagecache" thing for 15 years
> and it's still happening.
> 

Yes. There have been variations of it too such as applications being pushed
prematurely into swap. I'm not certain how well we currently handle that
because I haven't checked in a few months.

> It should be possible nowadays to toss your backup application into a
> container to constrain its pagecache usage.  So we can type
> 
>       run-in-a-memcg -m 200MB /my/backup/program
> 
> and voila.  Does such a script exist and work?
> 

Michal already gave an example. It might work slower if the backup
application has to stall in direct reclaim to keep the container within
limits though.

> > --- a/mm/fadvise.c
> > +++ b/mm/fadvise.c
> > @@ -17,6 +17,7 @@
> >  #include <linux/fadvise.h>
> >  #include <linux/writeback.h>
> >  #include <linux/syscalls.h>
> > +#include <linux/swap.h>
> >  
> >  #include <asm/unistd.h>
> >  
> > @@ -120,9 +121,22 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, 
> > loff_t len, int advice)
> >             start_index = (offset+(PAGE_CACHE_SIZE-1)) >> PAGE_CACHE_SHIFT;
> >             end_index = (endbyte >> PAGE_CACHE_SHIFT);
> >  
> > -           if (end_index >= start_index)
> > -                   invalidate_mapping_pages(mapping, start_index,
> > +           if (end_index >= start_index) {
> > +                   unsigned long count = invalidate_mapping_pages(mapping,
> > +                                           start_index, end_index);
> > +
> > +                   /*
> > +                    * If fewer pages were invalidated than expected then
> > +                    * it is possible that some of the pages were on
> > +                    * a per-cpu pagevec for a remote CPU. Drain all
> > +                    * pagevecs and try again.
> > +                    */
> > +                   if (count < (end_index - start_index + 1)) {
> > +                           lru_add_drain_all();
> > +                           invalidate_mapping_pages(mapping, start_index,
> >                                             end_index);
> > +                   }
> > +           }
> >             break;
> >     default:
> >             ret = -EINVAL;
> 
> Those LRU pagevecs are a right pain.  They provided useful gains way
> back when I first inflicted them upon Linux, but it would be nice to
> confirm whether they're still worthwhile and if so, whether the
> benefits can be replicated with some less intrusive scheme.
> 

I know. Unfortunately I've had "Implement pagevec removal and test" on my
TODO list for the guts of a year now. It's long overdue to actually sit down
and just do it. It's a similar story for the per-cpu lists in front of the
page allocator which are overdue to see if they can be replaced. I actually
have a prototype replacement for that lying around but it performed slower
in tests and has bit-rotted since but it ran slower and has bit-rotted
since as it was based on kernel 3.4.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to