Pavel Emelianov wrote:
> Balbir Singh wrote:
>> Reclaim memory as we hit the max_shares limit. The code for reclamation
>> is inspired from Dave Hansen's challenged memory controller and from the
>> shrink_all_memory() code
>>
>> Reclamation can be triggered from two paths
>>
>> 1. While incrementing the RSS, we hit the limit of the container
>> 2. A container is resized, such that it's new limit is below its current
>>    RSS
>>
>> In (1) reclamation takes place in the background.
> 
> Hmm... This is not a hard limit in this case, right? And in case
> of overloaded system from the moment reclamation thread is woken
> up till the moment it starts shrinking zones container may touch
> too many pages...
> 
> That's not good.

Yes, please see my comments in the TODO's. Hard limits should be easy
to implement, it's a question of calling the correct routine based
on policy.

> 
>> TODO's
>>
>> 1. max_shares currently works like a soft limit. The RSS can grow beyond it's
>>    limit. One possible fix is to introduce a soft limit (reclaim when the
>>    container hits the soft limit) and fail when we hit the hard limit
> 
> Such soft limit doesn't help also. It just makes effects on
> low-loaded system smoother.
> 
> And what about a hard limit - how would you fail in page fault in
> case of limit hit? SIGKILL/SEGV is not an option - in this case we
> should run synchronous reclamation. This is done in beancounter
> patches v6 we've sent recently.
> 

I thought about running synchronous reclamation, but then did not follow
that approach, I was not sure if calling the reclaim routines from the
page fault context is a good thing to do. It's worth trying out, since
it would provide better control over rss.


>> Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
>> ---
>>
>> --- linux-2.6.19-rc2/mm/vmscan.c~container-memctlr-reclaim   2006-11-09 
>> 22:21:11.000000000 +0530
>> +++ linux-2.6.19-rc2-balbir/mm/vmscan.c      2006-11-09 22:21:11.000000000 
>> +0530
>> @@ -36,6 +36,8 @@
>>  #include <linux/rwsem.h>
>>  #include <linux/delay.h>
>>  #include <linux/kthread.h>
>> +#include <linux/container.h>
>> +#include <linux/memctlr.h>
>>  
>>  #include <asm/tlbflush.h>
>>  #include <asm/div64.h>
>> @@ -65,6 +67,9 @@ struct scan_control {
>>      int swappiness;
>>  
>>      int all_unreclaimable;
>> +
>> +    int overlimit;
>> +    void *container;        /* Added as void * to avoid #ifdef's */
>>  };
>>  
>>  /*
>> @@ -811,6 +816,10 @@ force_reclaim_mapped:
>>              cond_resched();
>>              page = lru_to_page(&l_hold);
>>              list_del(&page->lru);
>> +            if (!memctlr_page_reclaim(page, sc->container, sc->overlimit)) {
>> +                    list_add(&page->lru, &l_active);
>> +                    continue;
>> +            }
>>              if (page_mapped(page)) {
>>                      if (!reclaim_mapped ||
>>                          (total_swap_pages == 0 && PageAnon(page)) ||
> 
> [snip] See comment below.
> 
>>  
>> +#ifdef CONFIG_RES_GROUPS_MEMORY
>> +/*
>> + * Modelled after shrink_all_memory
>> + */
>> +unsigned long memctlr_shrink_container_memory(unsigned long nr_pages,
>> +                                            struct container *container,
>> +                                            int overlimit)
>> +{
>> +    unsigned long lru_pages;
>> +    unsigned long ret = 0;
>> +    int pass;
>> +    struct zone *zone;
>> +    struct scan_control sc = {
>> +            .gfp_mask = GFP_KERNEL,
>> +            .may_swap = 0,
>> +            .swap_cluster_max = nr_pages,
>> +            .may_writepage = 1,
>> +            .swappiness = vm_swappiness,
>> +            .overlimit = overlimit,
>> +            .container = container,
>> +    };
>> +
> 
> [snip]
> 
>> +            for (prio = DEF_PRIORITY; prio >= 0; prio--) {
>> +                    unsigned long nr_to_scan = nr_pages - ret;
>> +
>> +                    sc.nr_scanned = 0;
>> +                    ret += shrink_all_zones(nr_to_scan, prio, pass, &sc);
>> +                    if (ret >= nr_pages)
>> +                            break;
>> +
>> +                    if (sc.nr_scanned && prio < DEF_PRIORITY - 2)
>> +                            blk_congestion_wait(WRITE, HZ / 10);
>> +            }
>> +    }
>> +    return ret;
>> +}
>> +#endif
> 
> Please correct me if I'm wrong, but does this reclamation work like
> "run over all the zones' lists searching for page whose controller
> is sc->container" ?
> 

Yeah, that's correct. The code can also reclaim memory from all over-the-limit
containers (by passing SC_OVERLIMIT_ALL). The idea behind using such a scheme
is to ensure that the global LRU list is not broken.


-- 
        Thanks for the feedback,
        Balbir Singh,
        Linux Technology Center,
        IBM Software Labs

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
ckrm-tech mailing list
https://lists.sourceforge.net/lists/listinfo/ckrm-tech

Reply via email to