[PATCH] VM cache policy change.

Richard Braun Mon, 09 Jul 2012 07:45:12 -0700

This patch lets the kernel unconditionnally cache non empty unreferenced
objects instead of using a fixed arbitrary limit. As the pageout daemon
evicts pages, it collects cached objects that have become empty. The
effective result is a graceful adjustment of the number of objects
related to memory management (virtual memory objects, their associated
ports, and potentially objects maintained in the external memory
managers). Physical memory can now be almost entirely filled up with
cached pages. In addition, these cached pages are not automatically
deactivated as objects can quickly be referenced again.


There are problems with this patch however. The first is that, on
machines with a large amount of physical memory (above 1 GiB but it also
depends on usage patterns), scalability issues are exposed. For example,
file systems which don't throttle their writeback requests can create
thread storms, strongly reducing system responsiveness. Other issues
such as linear scans of memory objects also add visible CPU overhead.

The second is that, as most memory is used, it increases the chances of
swapping deadlocks. The pageout daemon priority and the thresholds used
to wake it have been increased to help with that, and in practice, this
hack works well for most workloads. Applications that map large objects
and quickly cause lots of page faults can still easily bring the system
to its knees.

This patch should improve performance a lot on real hardware, but it
can slightly reduce it for virtualized systems with much physical memory
where disk accesses use the host cache.

Note that you need up-to-date versions of both GNU Mach [1] and the
Hurd [2] to test this patch, as it needs important bug fixes to avoid
file system crashes and kernel panics.

Remember that, in addition to vmstat, you can monitor the state of the
slab allocator with slabinfo [3].

Feedbacks are very welcome.

[1] git commit 8d219eab0dcfbdcf464340630d568c4e16d7acbd
[2] git commit 2f4f65ce9137aab6acaf1004bacc09d3a975d881
[3] http://git.sceen.net/rbraun/slabinfo.git/
---
 vm/vm_object.c   |  177 ++++++++++++++++--------------------------------------
 vm/vm_object.h   |    9 ++-
 vm/vm_pageout.c  |   12 +++-
 vm/vm_resident.c |    2 +
 4 files changed, 71 insertions(+), 129 deletions(-)

diff --git a/vm/vm_object.c b/vm/vm_object.c
index f101708..15c9ac0 100644
--- a/vm/vm_object.c
+++ b/vm/vm_object.c
@@ -61,8 +61,6 @@ void memory_object_release(
        pager_request_t pager_request,
        ipc_port_t      pager_name); /* forward */
 
-void vm_object_deactivate_pages(vm_object_t);
-
 /*
  *     Virtual memory objects maintain the actual data
  *     associated with allocated virtual memory.  A given
@@ -141,7 +139,11 @@ void vm_object_deactivate_pages(vm_object_t);
  * ZZZ Continue this comment.
  */
 
-struct kmem_cache      vm_object_cache; /* vm backing store cache */
+/*
+ *     Cache used for vm_object allocations, not to be confused
+ *     with the cache of persisting virtual memory objects.
+ */
+struct kmem_cache      vm_object_cache;
 
 /*
  *     All wired-down kernel memory belongs to a single virtual
@@ -163,8 +165,9 @@ vm_object_t         kernel_object = &kernel_object_store;
  *
  *     The kernel may choose to terminate objects from this
  *     queue in order to reclaim storage.  The current policy
- *     is to permit a fixed maximum number of unreferenced
- *     objects (vm_object_cached_max).
+ *     is to let memory pressure dynamically adjust the number
+ *     of unreferenced objects. The pageout daemon attempts to
+ *     collect objects after removing pages from them.
  *
  *     A simple lock (accessed by routines
  *     vm_object_cache_{lock,lock_try,unlock}) governs the
@@ -179,8 +182,6 @@ vm_object_t         kernel_object = &kernel_object_store;
  *     not be held to make simple references.
  */
 queue_head_t   vm_object_cached_list;
-int            vm_object_cached_count;
-int            vm_object_cached_max = 4000;    /* may be patched*/
 
 decl_simple_lock_data(,vm_object_cached_lock_data)
 
@@ -334,6 +335,33 @@ void vm_object_init(void)
                        IKOT_PAGING_NAME);
 }
 
+void vm_object_collect(
+       register vm_object_t    object)
+{
+       vm_object_unlock(object);
+
+       /*
+        *      The cache lock must be acquired in the proper order.
+        */
+
+       vm_object_cache_lock();
+       vm_object_lock(object);
+
+       /*
+        *      If the object was referenced while the lock was
+        *      dropped, cancel the termination.
+        */
+
+       if (!vm_object_collectable(object)) {
+               vm_object_unlock(object);
+               vm_object_cache_unlock();
+               return;
+       }
+
+       queue_remove(&vm_object_cached_list, object, vm_object_t, cached_list);
+       vm_object_terminate(object);
+}
+
 /*
  *     vm_object_reference:
  *
@@ -394,102 +422,31 @@ void vm_object_deallocate(
 
                /*
                 *      See whether this object can persist.  If so, enter
-                *      it in the cache, then deactivate all of its
-                *      pages.
+                *      it in the cache.
                 */
-               if (object->can_persist) {
-                       boolean_t       overflow;
-
-                       /*
-                        *      Enter the object onto the queue
-                        *      of "cached" objects.  Remember whether
-                        *      we've caused the queue to overflow,
-                        *      as a hint.
-                        */
-
+               if (object->can_persist && (object->resident_page_count > 0)) {
                        queue_enter(&vm_object_cached_list, object,
                                vm_object_t, cached_list);
-                       overflow = (++vm_object_cached_count > 
vm_object_cached_max);
                        vm_object_cache_unlock();
-
-                       vm_object_deactivate_pages(object);
                        vm_object_unlock(object);
+                       return;
+               }
 
-                       /*
-                        *      If we didn't overflow, or if the queue has
-                        *      been reduced back to below the specified
-                        *      minimum, then quit.
-                        */
-                       if (!overflow)
-                               return;
-
-                       while (TRUE) {
-                               vm_object_cache_lock();
-                               if (vm_object_cached_count <=
-                                   vm_object_cached_max) {
-                                       vm_object_cache_unlock();
-                                       return;
-                               }
-
-                               /*
-                                *      If we must trim down the queue, take
-                                *      the first object, and proceed to
-                                *      terminate it instead of the original
-                                *      object.  Have to wait for pager init.
-                                *  if it's in progress.
-                                */
-                               object= (vm_object_t)
-                                   queue_first(&vm_object_cached_list);
-                               vm_object_lock(object);
-
-                               if (!(object->pager_created &&
-                                   !object->pager_initialized)) {
-
-                                       /*
-                                        *  Ok to terminate, hang on to lock.
-                                        */
-                                       break;
-                               }
-
-                               vm_object_assert_wait(object,
-                                       VM_OBJECT_EVENT_INITIALIZED, FALSE);
-                               vm_object_unlock(object);
-                               vm_object_cache_unlock();
-                               thread_block((void (*)()) 0);
-
-                               /*
-                                *  Continue loop to check if cache still
-                                *  needs to be trimmed.
-                                */
-                       }
+               if (object->pager_created &&
+                   !object->pager_initialized) {
 
                        /*
-                        *      Actually remove object from cache.
+                        *      Have to wait for initialization.
+                        *      Put reference back and retry
+                        *      when it's initialized.
                         */
-
-                       queue_remove(&vm_object_cached_list, object,
-                                       vm_object_t, cached_list);
-                       vm_object_cached_count--;
-
-                       assert(object->ref_count == 0);
-               }
-               else {
-                       if (object->pager_created &&
-                           !object->pager_initialized) {
-
-                               /*
-                                *      Have to wait for initialization.
-                                *      Put reference back and retry
-                                *      when it's initialized.
-                                */
-                               object->ref_count++;
-                               vm_object_assert_wait(object,
-                                       VM_OBJECT_EVENT_INITIALIZED, FALSE);
-                               vm_object_unlock(object);
-                               vm_object_cache_unlock();
-                               thread_block((void (*)()) 0);
-                               continue;
-                         }
+                       object->ref_count++;
+                       vm_object_assert_wait(object,
+                               VM_OBJECT_EVENT_INITIALIZED, FALSE);
+                       vm_object_unlock(object);
+                       vm_object_cache_unlock();
+                       thread_block((void (*)()) 0);
+                       continue;
                }
 
                /*
@@ -516,8 +473,6 @@ void vm_object_deallocate(
        }
 }
 
-boolean_t      vm_object_terminate_remove_all = FALSE;
-
 /*
  *     Routine:        vm_object_terminate
  *     Purpose:
@@ -870,28 +825,6 @@ kern_return_t memory_object_destroy(
 }
 
 /*
- *     vm_object_deactivate_pages
- *
- *     Deactivate all pages in the specified object.  (Keep its pages
- *     in memory even though it is no longer referenced.)
- *
- *     The object must be locked.
- */
-void vm_object_deactivate_pages(
-       register vm_object_t    object)
-{
-       register vm_page_t      p;
-
-       queue_iterate(&object->memq, p, vm_page_t, listq) {
-               vm_page_lock_queues();
-               if (!p->busy)
-                       vm_page_deactivate(p);
-               vm_page_unlock_queues();
-       }
-}
-
-
-/*
  *     Routine:        vm_object_pmap_protect
  *
  *     Purpose:
@@ -1859,7 +1792,6 @@ vm_object_t vm_object_lookup(
                        if (object->ref_count == 0) {
                                queue_remove(&vm_object_cached_list, object,
                                             vm_object_t, cached_list);
-                               vm_object_cached_count--;
                        }
 
                        object->ref_count++;
@@ -1890,7 +1822,6 @@ vm_object_t vm_object_lookup_name(
                        if (object->ref_count == 0) {
                                queue_remove(&vm_object_cached_list, object,
                                             vm_object_t, cached_list);
-                               vm_object_cached_count--;
                        }
 
                        object->ref_count++;
@@ -1926,7 +1857,6 @@ void vm_object_destroy(
        if (object->ref_count == 0) {
                queue_remove(&vm_object_cached_list, object,
                                vm_object_t, cached_list);
-               vm_object_cached_count--;
        }
        object->ref_count++;
 
@@ -2079,7 +2009,6 @@ restart:
                if (object->ref_count == 0) {
                        queue_remove(&vm_object_cached_list, object,
                                        vm_object_t, cached_list);
-                       vm_object_cached_count--;
                }
                object->ref_count++;
                vm_object_unlock(object);
@@ -2743,7 +2672,7 @@ void vm_object_page_remove(
         *      It balances vm_object_lookup vs iteration.
         */
 
-       if (atop(end - start) < (unsigned)object->resident_page_count/16) {
+       if (atop(end - start) < object->resident_page_count/16) {
                vm_object_page_remove_lookup++;
 
                for (; start < end; start += PAGE_SIZE) {
@@ -2970,7 +2899,7 @@ void vm_object_print(
 
        iprintf("Object 0x%X: size=0x%X",
                (vm_offset_t) object, (vm_offset_t) object->size);
-        printf(", %d references, %d resident pages,", object->ref_count,
+        printf(", %d references, %lu resident pages,", object->ref_count,
                object->resident_page_count);
         printf(" %d absent pages,", object->absent_count);
         printf(" %d paging ops\n", object->paging_in_progress);
diff --git a/vm/vm_object.h b/vm/vm_object.h
index c992570..cfd8a72 100644
--- a/vm/vm_object.h
+++ b/vm/vm_object.h
@@ -71,8 +71,8 @@ struct vm_object {
                                                 * if internal)
                                                 */
 
-       short                   ref_count;      /* Number of references */
-       short                   resident_page_count;
+       int                     ref_count;      /* Number of references */
+       unsigned long           resident_page_count;
                                                /* number of resident pages */
 
        struct vm_object        *copy;          /* Object that should receive
@@ -169,6 +169,7 @@ vm_object_t kernel_object;          /* the single kernel 
object */
 
 extern void            vm_object_bootstrap(void);
 extern void            vm_object_init(void);
+extern void            vm_object_collect(vm_object_t);
 extern void            vm_object_terminate(vm_object_t);
 extern vm_object_t     vm_object_allocate(vm_size_t);
 extern void            vm_object_reference(vm_object_t);
@@ -282,6 +283,10 @@ extern void vm_object_pager_wakeup(ipc_port_t  pager);
  *     Routines implemented as macros
  */
 
+#define vm_object_collectable(object)                                  \
+       (((object)->ref_count == 0)                                     \
+        && ((object)->resident_page_count == 0))
+
 #define        vm_object_paging_begin(object)                                  
\
        ((object)->paging_in_progress++)
 
diff --git a/vm/vm_pageout.c b/vm/vm_pageout.c
index 77c1cfe..8d7492d 100644
--- a/vm/vm_pageout.c
+++ b/vm/vm_pageout.c
@@ -98,7 +98,7 @@
  */
 
 #ifndef        VM_PAGE_FREE_TARGET
-#define        VM_PAGE_FREE_TARGET(free)       (15 + (free) / 80)
+#define        VM_PAGE_FREE_TARGET(free)       (15 + (free) * 10 / 100)
 #endif /* VM_PAGE_FREE_TARGET */
 
 /*
@@ -107,7 +107,7 @@
  */
 
 #ifndef        VM_PAGE_FREE_MIN
-#define        VM_PAGE_FREE_MIN(free)  (10 + (free) / 100)
+#define        VM_PAGE_FREE_MIN(free)  (10 + (free) * 8 / 100)
 #endif /* VM_PAGE_FREE_MIN */
 
 /*      When vm_page_external_count exceeds vm_page_external_limit, 
@@ -750,7 +750,12 @@ void vm_pageout_scan()
                    reclaim_page:
                        vm_page_free(m);
                        vm_page_unlock_queues();
-                       vm_object_unlock(object);
+
+                       if (vm_object_collectable(object))
+                               vm_object_collect(object);
+                       else
+                               vm_object_unlock(object);
+
                        continue;
                }
 
@@ -915,6 +920,7 @@ void vm_pageout()
 
        current_thread()->vm_privilege = TRUE;
        stack_privilege(current_thread());
+       thread_set_own_priority(0);
 
        /*
         *      Initialize some paging parameters.
diff --git a/vm/vm_resident.c b/vm/vm_resident.c
index ae71a74..bdc7401 100644
--- a/vm/vm_resident.c
+++ b/vm/vm_resident.c
@@ -517,6 +517,7 @@ void vm_page_insert(
         */
 
        object->resident_page_count++;
+       assert(object->resident_page_count != 0);
 
        /*
         *      Detect sequential access and inactivate previous page.
@@ -616,6 +617,7 @@ void vm_page_replace(
         */
 
        object->resident_page_count++;
+       assert(object->resident_page_count != 0);
 }
 
 /*
-- 
1.7.10.4

[PATCH] VM cache policy change.

Reply via email to