Hi Torsten,

Thanks for the thorough explanation.

I used to be a big proponent of ASL, but I rely on it less since years
ago when you pointed out that the shared sizes were not accurate on
Linux.  I know that Smaps helps with that, but it seems fairly
expensive so I've avoided it.

These days I like to use a reasonable MaxRequestsPerChild (e.g. 100)
combined with a fairly high size limit in ASL.  That just helps to
catch any unusual growth in a process.

As for what to do with ASL, I think it probably does make more sense
to use RSS since we're trying to avoid swapping, but shared is not
reliable enough for me to trust anymore.  I don't think that an admin
swtching off swap while a server is live is worth worrying about, and
people already have to do their own measurement and tuning to choose
sizes when setting this up.

- Perrin

2011/2/11 Torsten Förtsch <torsten.foert...@gmx.net>:
> Hi,
>
> there is an ongoing discussion initiated by Max whether Apache::SizeLimit does
> the right thing in reporting the current amount of RAM a process does not
> share with any other process as
>
>  unshared_size = total_size - shared_size
>
> Max suggests to change that definition to
>
>  unshared_size = rss - shared_size
>
> Beside the fact that that change should be announced very loudly, perhaps by a
> new major version, because it requires current installations to adjust their
> parameters I am not sure whether it is the right way.
>
> (I am talking about Linux here)
>
> What does that mean?
> ====================
>
> The total size of a process comprises its complete address space. Normally, by
> far not everything of this space is present in RAM. When the process accesses
> a part of its address space that is not present the CPU generates an interrupt
> and the operating system reads in that piece from disk or allocates an empty
> page and thus makes the accessed page present. Then the operation is repeated
> and this time it succeeds. The process normally is not aware of all this.
>
> The part of the process that is really present in RAM is the RSS.
>
> Now, Linux comes with the /proc/$PID/smaps device that reports sizes for
> shared and private portions of the process' address space.
>
> How does that work?
> ===================
>
> Linux organizes the RAM and address spaces in fixed size chunks, so called
> pages (normally 4kb). Now, a single page of RAM can belongs to only one
> process or it can be used by multiple processes (for example because they use
> the same algorithmic part of the C library that is read-only). So, each page
> has a reference count. If that refcount is 1 the page is used by only one
> process and hence private to it. If the refcount is >1 the page is shared
> among multiple processes.
>
> When /proc/$PID/smaps is read for a process Linux walks all pages of the
> process and classifies them in 3 groups:
>
> - the page is present in RAM and has a refcount==1
>    ==> add it to the process total size and to the private portion
>
> - the page is present in RAM and has a refcount>1
>    ==> add it to the process total size and to the shared portion
>
> - the page is not present in RAM
>    ==> add it to the process total size
>
> The point here is, for a page that is not present Linux cannot read the
> refcount because that count is also not present in RAM. So, to decide if a
> page is shared or not it would have to read in the page. This is too expensive
> an operation only to read the refcount.
>
> So, while in theory a page is either used by only one process and hence
> private or by multiple and hence shared in practice we have
>
>  total_size = private + shared + notpresent
>
> where notpresent is either shared or private, we cannot know.
>
> How processes are created?
> ==========================
>
> Under Linux a process is create by the fork() or clone() system calls. In
> theory the operating system duplicates the complete address space of the
> process calling fork. One copy belongs to the original process (the parent)
> the other is for the new process (the child).
>
> But if we really had to copy the whole address space fork() would be a really
> expensive operation. In fact, only a table holding pointers to the pages that
> comprise the parent's address space is duplicated. And all pages are marked
> read-only and their reference count is incremented.
>
> Now, if one of the processes wants to write to a page the CPU again generates
> an interrupt because the page is marked as read-only. The operating system
> catches that interrupt. And only now the actual page is duplicated. One page
> for the writing process and one for the others. The refcount of the new page
> becomes 1 that of the old is decremented. This working pattern is called copy-
> on-write.
>
> With the apache web server we have one parent process that spawns many
> children. At first, almost all of the child's address space is shared with the
> parent due to copy-on-write. Over its lifetime the child's private address
> space grows by 2 means:
>
> * it allocates more memory
>  ==> total_size grows, unshared grows but shared stays the same.
> * it writes to portions that were initially shared with the parent
>  ==> unshared grows, shared shrinks but total_size does not change.
>
> Now, what is the goal of Apache::SizeLimit?
> ===========================================
>
> If the overall working set of all apache children becomes larger than the
> available RAM the system then the operating system has to fetch from the disk
> code and/or data for each request and by doing so it has to evict pages that
> will be needed by the next request shortly after.
>
> Apache::SizeLimit (ASL hereafter) tries to avoid this situation.
>
> Note, there is no problem with large process sizes and heavy swap space usage
> if the data remains there and is normally not used.
>
> ASL can monitor a few values per apache child, the total size of the process,
> the RSS portion, the portion that is reported as shared and the private part.
>
> As for the "unshared" value above it can be defined as:
> (all rvalues are reported by /proc/$PID/smaps)
>
>  1) unshared = size - shared
>
>  2) unshared = rss - shared
>
>  3) unshared = private
>    this is just the same as 2) because rss = shared + private
>
> What are the implications?
> ==========================
>
> 1) since size = shared + private + notpresent unshared becomes
>
>  unshared = private + notpresent
>
> Of notpresent we know nothing for sure. It is a mix of shared and private
> pages.
>
> Now, if an administrator turns off swapping (swapoff /dev/...) the part of
> notpresent that has been there becomes present. So, we can expect shared to
> grow considerably. unshared will shrink by the same amount.
>
> As for absolute values, unshared in this case is quite large a number because
> it is on top of notpresent.
>
> 2) here unshared lacks the notpresent part. So the actual number is much less
> than it would  be in case 1.
>
> But if an administrator turns off swapping now a part of notpresent will be
> added to unshared. unshared may suddenly jump over the limit in all apache
> children.
>
> Well, an administrator doing that on a busy web server should be converted
> into a httpd ...
>
> So, I am quite undecided what to do.
>
> Please comment!
>
> See also
>  http://foertsch.name/ModPerl-Tricks/Measuring-memory-consumption/index.shtml
>
>
> What else could be done to hit ASL's goal?
> ==========================================
>
> There are a few other status fields that can be possibly used:
>
> - "Swap" in /proc/$PID/smaps
>  don't know for sure what that means but sounds good. Need to inspect the
>  kernel code
>
> - "Referenced" in /proc/$PID/smaps
>  can be used to find out how many RAM a process has accessed since the last
>  reset of the counter. We could reset it in PerlPostReadRequestHandler and
>  read in a $r->pool cleanup.
>
> - "Pss" in /proc/$PID/smaps
>  segment size divided by the refcount
>
> - "VmSwap" in /proc/$PID/status
>  for example: terminate if the process starts to use swap space
>
> Certainly more.
>
> Torsten Förtsch
>
> --
> Need professional modperl support? Hire me! (http://foertsch.name)
>
> Like fantasy? http://kabatinte.net
>

Reply via email to