Re: [PATCH 3/3] Provide control over unmapped pages (v5)

2011-03-30 Thread Andrew Morton
On Wed, 30 Mar 2011 11:02:38 +0530
Balbir Singh bal...@linux.vnet.ibm.com wrote:

 Changelog v4
 1. Added documentation for max_unmapped_pages
 2. Better #ifdef'ing of max_unmapped_pages and min_unmapped_pages
 
 Changelog v2
 1. Use a config option to enable the code (Andrew Morton)
 2. Explain the magic tunables in the code or at-least attempt
to explain them (General comment)
 3. Hint uses of the boot parameter with unlikely (Andrew Morton)
 4. Use better names (balanced is not a good naming convention)
 
 Provide control using zone_reclaim() and a boot parameter. The
 code reuses functionality from zone_reclaim() to isolate unmapped
 pages and reclaim them as a priority, ahead of other mapped pages.
 

This:

akpm:/usr/src/25 grep '^+#' 
patches/provide-control-over-unmapped-pages-v5.patch 
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#else
+#endif
+#ifdef CONFIG_NUMA
+#else
+#define zone_reclaim_mode 0
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#endif
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)

is getting out of control.  What happens if we just make the feature
non-configurable?

 +static int __init unmapped_page_control_parm(char *str)
 +{
 + unmapped_page_control = 1;
 + /*
 +  * XXX: Should we tweak swappiness here?
 +  */
 + return 1;
 +}
 +__setup(unmapped_page_control, unmapped_page_control_parm);

That looks like a pain - it requires a reboot to change the option,
which makes testing harder and slower.  Methinks you're being a bit
virtualization-centric here!

 +#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */
 +static inline void reclaim_unmapped_pages(int priority,
 + struct zone *zone, struct scan_control *sc)
 +{
 + return 0;
 +}
 +#endif
 +
  static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
 struct scan_control *sc)
  {
 @@ -2371,6 +2394,12 @@ loop_again:
   shrink_active_list(SWAP_CLUSTER_MAX, zone,
   sc, priority, 0);
  
 + /*
 +  * We do unmapped page reclaim once here and once
 +  * below, so that we don't lose out
 +  */
 + reclaim_unmapped_pages(priority, zone, sc);

Doing this here seems wrong.  balance_pgdat() does two passes across
the zones.  The first pass is a read-only work-out-what-to-do pass and
the second pass is a now-reclaim-some-stuff pass.  But here we've stuck
a do-some-reclaiming operation inside the first, work-out-what-to-do pass.


 @@ -2408,6 +2437,11 @@ loop_again:
   continue;
  
   sc.nr_scanned = 0;
 + /*
 +  * Reclaim unmapped pages upfront, this should be
 +  * really cheap

Comment is mysterious.  Why is it cheap?

 +  */
 + reclaim_unmapped_pages(priority, zone, sc);


I dunno, the whole thing seems rather nasty to me.

It sticks a magical reclaim-unmapped-pages operation right in the
middle of regular page reclaim.  This means that reclaim will walk the
LRU looking at mapped and unmapped pages.  Then it will walk some more,
looking at only unmapped pages and moving the mapped ones to the head
of the LRU.  Then it goes back to looking at mapped and unmapped pages.
So it rather screws up the LRU ordering and page aging, does it not?

Also, the special-case handling sticks out like a sore thumb.  Would it
not be better to manage the mapped/unmapped bias within the core of the
regular scanning?  ie: in shrink_page_list().
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v5)

2011-03-30 Thread Balbir Singh
* Andrew Morton a...@linux-foundation.org [2011-03-30 16:35:45]:

 On Wed, 30 Mar 2011 11:02:38 +0530
 Balbir Singh bal...@linux.vnet.ibm.com wrote:
 
  Changelog v4
  1. Added documentation for max_unmapped_pages
  2. Better #ifdef'ing of max_unmapped_pages and min_unmapped_pages
  
  Changelog v2
  1. Use a config option to enable the code (Andrew Morton)
  2. Explain the magic tunables in the code or at-least attempt
 to explain them (General comment)
  3. Hint uses of the boot parameter with unlikely (Andrew Morton)
  4. Use better names (balanced is not a good naming convention)
  
  Provide control using zone_reclaim() and a boot parameter. The
  code reuses functionality from zone_reclaim() to isolate unmapped
  pages and reclaim them as a priority, ahead of other mapped pages.
  
 
 This:
 
 akpm:/usr/src/25 grep '^+#' 
 patches/provide-control-over-unmapped-pages-v5.patch 
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#else
 +#endif
 +#ifdef CONFIG_NUMA
 +#else
 +#define zone_reclaim_mode 0
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#endif
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 
 is getting out of control.  What happens if we just make the feature
 non-configurable?


I added the configuration based on review comments I received. If the
feature is made non-configurable, it should be easy to remove them or
just set the default value to y in the config.
 
  +static int __init unmapped_page_control_parm(char *str)
  +{
  +   unmapped_page_control = 1;
  +   /*
  +* XXX: Should we tweak swappiness here?
  +*/
  +   return 1;
  +}
  +__setup(unmapped_page_control, unmapped_page_control_parm);
 
 That looks like a pain - it requires a reboot to change the option,
 which makes testing harder and slower.  Methinks you're being a bit
 virtualization-centric here!

:-) The reason for the boot parameter is to ensure that people know
what they are doing.

 
  +#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */
  +static inline void reclaim_unmapped_pages(int priority,
  +   struct zone *zone, struct scan_control *sc)
  +{
  +   return 0;
  +}
  +#endif
  +
   static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
struct scan_control *sc)
   {
  @@ -2371,6 +2394,12 @@ loop_again:
  shrink_active_list(SWAP_CLUSTER_MAX, zone,
  sc, priority, 0);
   
  +   /*
  +* We do unmapped page reclaim once here and once
  +* below, so that we don't lose out
  +*/
  +   reclaim_unmapped_pages(priority, zone, sc);
 
 Doing this here seems wrong.  balance_pgdat() does two passes across
 the zones.  The first pass is a read-only work-out-what-to-do pass and
 the second pass is a now-reclaim-some-stuff pass.  But here we've stuck
 a do-some-reclaiming operation inside the first, work-out-what-to-do pass.


The reason is primarily for balancing, zone_watermark's do not give us
a good idea of whether unmapped pages are balanced, hence the code.
 
 
  @@ -2408,6 +2437,11 @@ loop_again:
  continue;
   
  sc.nr_scanned = 0;
  +   /*
  +* Reclaim unmapped pages upfront, this should be
  +* really cheap
 
 Comment is mysterious.  Why is it cheap?

Cheap because we do a quick check to see if unmapped pages exceed a
threshold. If selective users enable this functionality (which is
expected), the use case is primarily for embedded and virtualization
folks, this should be a simple check.

 
  +*/
  +   reclaim_unmapped_pages(priority, zone, sc);
 
 
 I dunno, the whole thing seems rather nasty to me.
 
 It sticks a magical reclaim-unmapped-pages operation right in the
 middle of regular page reclaim.  This means that reclaim will walk the
 LRU looking at mapped and unmapped pages.  Then it will walk some more,
 looking at only unmapped pages 

[PATCH 3/3] Provide control over unmapped pages (v5)

2011-03-29 Thread Balbir Singh
Changelog v4
1. Added documentation for max_unmapped_pages
2. Better #ifdef'ing of max_unmapped_pages and min_unmapped_pages

Changelog v2
1. Use a config option to enable the code (Andrew Morton)
2. Explain the magic tunables in the code or at-least attempt
   to explain them (General comment)
3. Hint uses of the boot parameter with unlikely (Andrew Morton)
4. Use better names (balanced is not a good naming convention)

Provide control using zone_reclaim() and a boot parameter. The
code reuses functionality from zone_reclaim() to isolate unmapped
pages and reclaim them as a priority, ahead of other mapped pages.

Signed-off-by: Balbir Singh bal...@linux.vnet.ibm.com
Reviewed-by: Christoph Lameter c...@linux.com
---
 Documentation/kernel-parameters.txt |8 +++
 Documentation/sysctl/vm.txt |   19 +++-
 include/linux/mmzone.h  |7 +++
 include/linux/swap.h|   25 --
 init/Kconfig|   12 +
 kernel/sysctl.c |   13 +
 mm/page_alloc.c |   29 
 mm/vmscan.c |   88 +++
 8 files changed, 194 insertions(+), 7 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index d4e67a5..f522c34 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2520,6 +2520,14 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
[X86]
Set unknown_nmi_panic=1 early on boot.
 
+   unmapped_page_control
+   [KNL] Available if CONFIG_UNMAPPED_PAGECACHE_CONTROL
+   is enabled. It controls the amount of unmapped memory
+   that is present in the system. This boot option plus
+   vm.min_unmapped_ratio (sysctl) provide granular control
+   over how much unmapped page cache can exist in the 
system
+   before kswapd starts reclaiming unmapped page cache 
pages.
+
usbcore.autosuspend=
[USB] The autosuspend time delay (in seconds) used
for newly-detected USB devices (default 2).  This
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 30289fa..1c722f7 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -381,11 +381,14 @@ and may not be fast.
 
 min_unmapped_ratio:
 
-This is available only on NUMA kernels.
+This is available only on NUMA kernels or when unmapped page cache
+control is enabled.
 
 This is a percentage of the total pages in each zone. Zone reclaim will
 only occur if more than this percentage of pages are in a state that
-zone_reclaim_mode allows to be reclaimed.
+zone_reclaim_mode allows to be reclaimed. If unmapped page cache control
+is enabled, this is the minimum level to which the cache will be shrunk
+down to.
 
 If zone_reclaim_mode has the value 4 OR'd, then the percentage is compared
 against all file-backed unmapped pages including swapcache pages and tmpfs
@@ -396,6 +399,18 @@ The default is 1 percent.
 
 ==
 
+max_unmapped_ratio:
+
+This is available only when unmapped page cache control is enabled.
+
+This is a percentage of the total pages in each zone. Zone reclaim will
+only occur if more than this percentage of pages are in a state and
+unmapped page cache control is enabled.
+
+The default is 16 percent.
+
+==
+
 mmap_min_addr
 
 This file indicates the amount of address space  which a user process will
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 59cbed0..caa29ad 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -309,7 +309,12 @@ struct zone {
/*
 * zone reclaim becomes active if more unmapped pages exist.
 */
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
unsigned long   min_unmapped_pages;
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+   unsigned long   max_unmapped_pages;
+#endif
 #ifdef CONFIG_NUMA
int node;
unsigned long   min_slab_pages;
@@ -776,6 +781,8 @@ int percpu_pagelist_fraction_sysctl_handler(struct 
ctl_table *, int,
void __user *, size_t *, loff_t *);
 int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
+int sysctl_max_unmapped_ratio_sysctl_handler(struct ctl_table *, int,
+   void __user *, size_t *, loff_t *);
 int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
 
diff --git a/include/linux/swap.h b/include/linux/swap.h
index ce8f686..86cafc5 100644