Hi Chandra,

Response embedded below.


> On Tue, Apr 19, 2005 at 04:48:38PM -0400, Marc E. Fiuczynski wrote:
> > Hi Chandra,
> >
> > The following program will be killed in our system by the
> > memory controller
> > (both the E17 based one and a prior one):
> >
> > dd bs=4096 count=250000 < /dev/zero > /bigfile
> >
> > The class within which this command executes is setup as follows:
> >
> > res=mem,guarantee=-2,limit=125000,total_guarantee=100,max_limit=100
> >
> > The default class has the following:
> >
> > res=mem,guarantee=-2,limit=-2,total_guarantee=322735,max_limit=322735
>
> That is bad behavior....

Right... a process should not die simply because its use of the buffer cache
caused it to overrun the memory limit.

> Have you played with the config parameters to see if it helps ?

Not yet.

> > Is it really the case that dd consumes that much memory and
> > therefore must be killed?  I think this is unlikely, as
> > observing its VIRT and RSS from the top output shows that
> > its size does not grow beyond a few megabytes.
>
> Yes, its RSS/VIRT doesn't increase... but those are not what we are
> accounting....
>
> We account the pages that come into lru list(active/inactive)....
>
> May be the controller should also follow RSS and usage of the class and be
> aggressive in cleaning up the active inactive lists. I 'll think about it.

I cannot imagine that the pages in the inactive list are dirty. I.e., dd is
writing the data out to disk.  So maybe it is ok to be more aggressive in
cleaning out the class's clean pages from the inactive list.

> >
> > Or, is the memory controller keeping track of pages that
> > logically no longer belong to the class.  Looking at
> > output of dmesg, I see the following bef_shnk_cls and
> > aft_shnk_cls debug messages:
> >
> === stuff deleted ===
> >
> > Not sure how to interpret this exactly, but it seems me that
> > the bulk of the pages are in the inae/lina lists.  Maybe the
> > mem controller should be more aggressive in cleaning out these
> > lists before killing a process like dd.
>
> The logic is purposely simple and less-intrusive... If a class is
> over its limit, then we try to bring the class's (lru list's)
> usage to a level (as specified in the config file) by using the
> VM subsystem's shrink logic.
>
> There are few problems if we aggressively cleanup the lru lists.
>       - we may be throwing the pages out which may be pulled in
>       by the task in very near future.

This is only the case if the processes within the same class approach or
exceed the memlimit.

>       - if a class need lot more than the limit, we will be
>       swapping bigtime to the disk, which would slow down
>       the other processes in the system

Why?  Because the class is consuming too much I/O bandwidth?  Well, that's
what a working I/O controller should solve, right?

> We cannot differentiate the above dd case with a legitimate use
> of a class(unless we monitor the RSS usage as mentioned above).

Well, I wouldn't go so far and classify whatever dd is doing as
"not"-legitimate.

Marc



-------------------------------------------------------
This SF.Net email is sponsored by: New Crystal Reports XI.
Version 11 adds new functionality designed to reduce time involved in
creating, integrating, and deploying reporting solutions. Free runtime info,
new features, or free trial, at: http://www.businessobjects.com/devxi/728
_______________________________________________
ckrm-tech mailing list
https://lists.sourceforge.net/lists/listinfo/ckrm-tech

Reply via email to