Commit:     4106f83a9f86afc423557d0d92ebf4b3f36728c1
Parent:     6cb062296f73e74768cca2f3eaf90deac54de02d
Author:     Andrea Arcangeli <[EMAIL PROTECTED]>
AuthorDate: Tue Oct 16 01:25:42 2007 -0700
Committer:  Linus Torvalds <[EMAIL PROTECTED]>
CommitDate: Tue Oct 16 09:42:59 2007 -0700

    make swappiness safer to use
    Swappiness isn't a safe sysctl.  Setting it to 0 for example can hang a
    system.  That's a corner case but even setting it to 10 or lower can waste
    enormous amounts of cpu without making much progress.  We've customers who
    wants to use swappiness but they can't because of the current
    implementation (if you change it so the system stops swapping it really
    stops swapping and nothing works sane anymore if you really had to swap
    something to make progress).
    This patch from Kurt Garloff makes swappiness safer to use (no more huge
    cpu usage or hangs with low swappiness values).
    I think the prev_priority can also be nuked since it wastes 4 bytes per
    zone (that would be an incremental patch but I wait the nr_scan_[in]active
    to be nuked first for similar reasons).  Clearly somebody at some point
    noticed how broken that thing was and they had to add min(priority,
    prev_priority) to give it some reliability, but they didn't go the last
    mile to nuke prev_priority too.  Calculating distress only in function of
    not-racy priority is correct and sure more than enough without having to
    add randomness into the equation.
    Patch is tested on older kernels but it compiles and it's quite simple
    Overall I'm not very satisified by the swappiness tweak, since it doesn't
    rally do anything with the dirty pagecache that may be inactive.  We need
    another kind of tweak that controls the inactive scan and tunes the
    can_writepage feature (not yet in mainline despite having submitted it a
    few times), not only the active one.  That new tweak will tell the kernel
    how hard to scan the inactive list for pure clean pagecache (something the
    mainline kernel isn't capable of yet).  We already have that feature
    working in all our enterprise kernels with the default reasonable tune, or
    they can't even run a readonly backup with tar without triggering huge
    write I/O.  I think it should be available also in mainline later.
    Cc: Nick Piggin <[EMAIL PROTECTED]>
    Signed-off-by: Kurt Garloff <[EMAIL PROTECTED]>
    Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>
    Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
    Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
    Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
 mm/vmscan.c |   41 +++++++++++++++++++++++++++++++++++++++++
 1 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index cb8ad3c..bbd1946 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -932,6 +932,7 @@ static void shrink_active_list(unsigned long nr_pages, 
struct zone *zone,
                long mapped_ratio;
                long distress;
                long swap_tendency;
+               long imbalance;
                if (zone_is_near_oom(zone))
                        goto force_reclaim_mapped;
@@ -967,6 +968,46 @@ static void shrink_active_list(unsigned long nr_pages, 
struct zone *zone,
                swap_tendency = mapped_ratio / 2 + distress + sc->swappiness;
+                * If there's huge imbalance between active and inactive
+                * (think active 100 times larger than inactive) we should
+                * become more permissive, or the system will take too much
+                * cpu before it start swapping during memory pressure.
+                * Distress is about avoiding early-oom, this is about
+                * making swappiness graceful despite setting it to low
+                * values.
+                *
+                * Avoid div by zero with nr_inactive+1, and max resulting
+                * value is vm_total_pages.
+                */
+               imbalance  = zone_page_state(zone, NR_ACTIVE);
+               imbalance /= zone_page_state(zone, NR_INACTIVE) + 1;
+               /*
+                * Reduce the effect of imbalance if swappiness is low,
+                * this means for a swappiness very low, the imbalance
+                * must be much higher than 100 for this logic to make
+                * the difference.
+                *
+                * Max temporary value is vm_total_pages*100.
+                */
+               imbalance *= (vm_swappiness + 1);
+               imbalance /= 100;
+               /*
+                * If not much of the ram is mapped, makes the imbalance
+                * less relevant, it's high priority we refill the inactive
+                * list with mapped pages only in presence of high ratio of
+                * mapped pages.
+                *
+                * Max temporary value is vm_total_pages*100.
+                */
+               imbalance *= mapped_ratio;
+               imbalance /= 100;
+               /* apply imbalance feedback to swap_tendency */
+               swap_tendency += imbalance;
+               /*
                 * Now use this metric to decide whether to start moving mapped
                 * memory onto the inactive list.
To unsubscribe from this list: send the line "unsubscribe git-commits-head" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at

Reply via email to