Hi,

I would go with rss - shared_size. Especially on 64bit-platforms the
total_size gives much too high values (even without swap space). Using the
other values like Pss or Swap is not possible on older kernels (I don't
have these values on EC2-instances for example). An option would be to
substract Swap from unshared_size if it is present. Personally I don't
bother if swapped space is shared or not.

Hendrik

Am Fr, 11.02.2011, 15:26, schrieb Torsten Förtsch:
> Hi,
>
> there is an ongoing discussion initiated by Max whether Apache::SizeLimit
> does
> the right thing in reporting the current amount of RAM a process does not
> share with any other process as
>
>   unshared_size = total_size - shared_size
>
> Max suggests to change that definition to
>
>   unshared_size = rss - shared_size
>
> Beside the fact that that change should be announced very loudly, perhaps
> by a
> new major version, because it requires current installations to adjust
> their
> parameters I am not sure whether it is the right way.
>
> (I am talking about Linux here)
>
> What does that mean?
> ====================
>
> The total size of a process comprises its complete address space.
> Normally, by
> far not everything of this space is present in RAM. When the process
> accesses
> a part of its address space that is not present the CPU generates an
> interrupt
> and the operating system reads in that piece from disk or allocates an
> empty
> page and thus makes the accessed page present. Then the operation is
> repeated
> and this time it succeeds. The process normally is not aware of all this.
>
> The part of the process that is really present in RAM is the RSS.
>
> Now, Linux comes with the /proc/$PID/smaps device that reports sizes for
> shared and private portions of the process' address space.
>
> How does that work?
> ===================
>
> Linux organizes the RAM and address spaces in fixed size chunks, so called
> pages (normally 4kb). Now, a single page of RAM can belongs to only one
> process or it can be used by multiple processes (for example because they
> use
> the same algorithmic part of the C library that is read-only). So, each
> page
> has a reference count. If that refcount is 1 the page is used by only one
> process and hence private to it. If the refcount is >1 the page is shared
> among multiple processes.
>
> When /proc/$PID/smaps is read for a process Linux walks all pages of the
> process and classifies them in 3 groups:
>
> - the page is present in RAM and has a refcount==1
>     ==> add it to the process total size and to the private portion
>
> - the page is present in RAM and has a refcount>1
>     ==> add it to the process total size and to the shared portion
>
> - the page is not present in RAM
>     ==> add it to the process total size
>
> The point here is, for a page that is not present Linux cannot read the
> refcount because that count is also not present in RAM. So, to decide if a
> page is shared or not it would have to read in the page. This is too
> expensive
> an operation only to read the refcount.
>
> So, while in theory a page is either used by only one process and hence
> private or by multiple and hence shared in practice we have
>
>   total_size = private + shared + notpresent
>
> where notpresent is either shared or private, we cannot know.
>
> How processes are created?
> ==========================
>
> Under Linux a process is create by the fork() or clone() system calls. In
> theory the operating system duplicates the complete address space of the
> process calling fork. One copy belongs to the original process (the
> parent)
> the other is for the new process (the child).
>
> But if we really had to copy the whole address space fork() would be a
> really
> expensive operation. In fact, only a table holding pointers to the pages
> that
> comprise the parent's address space is duplicated. And all pages are
> marked
> read-only and their reference count is incremented.
>
> Now, if one of the processes wants to write to a page the CPU again
> generates
> an interrupt because the page is marked as read-only. The operating system
> catches that interrupt. And only now the actual page is duplicated. One
> page
> for the writing process and one for the others. The refcount of the new
> page
> becomes 1 that of the old is decremented. This working pattern is called
> copy-
> on-write.
>
> With the apache web server we have one parent process that spawns many
> children. At first, almost all of the child's address space is shared with
> the
> parent due to copy-on-write. Over its lifetime the child's private address
> space grows by 2 means:
>
> * it allocates more memory
>   ==> total_size grows, unshared grows but shared stays the same.
> * it writes to portions that were initially shared with the parent
>   ==> unshared grows, shared shrinks but total_size does not change.
>
> Now, what is the goal of Apache::SizeLimit?
> ===========================================
>
> If the overall working set of all apache children becomes larger than the
> available RAM the system then the operating system has to fetch from the
> disk
> code and/or data for each request and by doing so it has to evict pages
> that
> will be needed by the next request shortly after.
>
> Apache::SizeLimit (ASL hereafter) tries to avoid this situation.
>
> Note, there is no problem with large process sizes and heavy swap space
> usage
> if the data remains there and is normally not used.
>
> ASL can monitor a few values per apache child, the total size of the
> process,
> the RSS portion, the portion that is reported as shared and the private
> part.
>
> As for the "unshared" value above it can be defined as:
> (all rvalues are reported by /proc/$PID/smaps)
>
>  1) unshared = size - shared
>
>  2) unshared = rss - shared
>
>  3) unshared = private
>     this is just the same as 2) because rss = shared + private
>
> What are the implications?
> ==========================
>
> 1) since size = shared + private + notpresent unshared becomes
>
>   unshared = private + notpresent
>
> Of notpresent we know nothing for sure. It is a mix of shared and private
> pages.
>
> Now, if an administrator turns off swapping (swapoff /dev/...) the part of
> notpresent that has been there becomes present. So, we can expect shared
> to
> grow considerably. unshared will shrink by the same amount.
>
> As for absolute values, unshared in this case is quite large a number
> because
> it is on top of notpresent.
>
> 2) here unshared lacks the notpresent part. So the actual number is much
> less
> than it would  be in case 1.
>
> But if an administrator turns off swapping now a part of notpresent will
> be
> added to unshared. unshared may suddenly jump over the limit in all apache
> children.
>
> Well, an administrator doing that on a busy web server should be converted
> into a httpd ...
>
> So, I am quite undecided what to do.
>
> Please comment!
>
> See also
>   http://foertsch.name/ModPerl-Tricks/Measuring-memory-consumption/index.shtml
>
>
> What else could be done to hit ASL's goal?
> ==========================================
>
> There are a few other status fields that can be possibly used:
>
> - "Swap" in /proc/$PID/smaps
>   don't know for sure what that means but sounds good. Need to inspect the
>   kernel code
>
> - "Referenced" in /proc/$PID/smaps
>   can be used to find out how many RAM a process has accessed since the
> last
>   reset of the counter. We could reset it in PerlPostReadRequestHandler
> and
>   read in a $r->pool cleanup.
>
> - "Pss" in /proc/$PID/smaps
>   segment size divided by the refcount
>
> - "VmSwap" in /proc/$PID/status
>   for example: terminate if the process starts to use swap space
>
> Certainly more.
>
> Torsten Förtsch
>
> --
> Need professional modperl support? Hire me! (http://foertsch.name)
>
> Like fantasy? http://kabatinte.net
>


Reply via email to