Hi Torsten, Thanks for the thorough explanation.
I used to be a big proponent of ASL, but I rely on it less since years ago when you pointed out that the shared sizes were not accurate on Linux. I know that Smaps helps with that, but it seems fairly expensive so I've avoided it. These days I like to use a reasonable MaxRequestsPerChild (e.g. 100) combined with a fairly high size limit in ASL. That just helps to catch any unusual growth in a process. As for what to do with ASL, I think it probably does make more sense to use RSS since we're trying to avoid swapping, but shared is not reliable enough for me to trust anymore. I don't think that an admin swtching off swap while a server is live is worth worrying about, and people already have to do their own measurement and tuning to choose sizes when setting this up. - Perrin 2011/2/11 Torsten Förtsch <torsten.foert...@gmx.net>: > Hi, > > there is an ongoing discussion initiated by Max whether Apache::SizeLimit does > the right thing in reporting the current amount of RAM a process does not > share with any other process as > > unshared_size = total_size - shared_size > > Max suggests to change that definition to > > unshared_size = rss - shared_size > > Beside the fact that that change should be announced very loudly, perhaps by a > new major version, because it requires current installations to adjust their > parameters I am not sure whether it is the right way. > > (I am talking about Linux here) > > What does that mean? > ==================== > > The total size of a process comprises its complete address space. Normally, by > far not everything of this space is present in RAM. When the process accesses > a part of its address space that is not present the CPU generates an interrupt > and the operating system reads in that piece from disk or allocates an empty > page and thus makes the accessed page present. Then the operation is repeated > and this time it succeeds. The process normally is not aware of all this. > > The part of the process that is really present in RAM is the RSS. > > Now, Linux comes with the /proc/$PID/smaps device that reports sizes for > shared and private portions of the process' address space. > > How does that work? > =================== > > Linux organizes the RAM and address spaces in fixed size chunks, so called > pages (normally 4kb). Now, a single page of RAM can belongs to only one > process or it can be used by multiple processes (for example because they use > the same algorithmic part of the C library that is read-only). So, each page > has a reference count. If that refcount is 1 the page is used by only one > process and hence private to it. If the refcount is >1 the page is shared > among multiple processes. > > When /proc/$PID/smaps is read for a process Linux walks all pages of the > process and classifies them in 3 groups: > > - the page is present in RAM and has a refcount==1 > ==> add it to the process total size and to the private portion > > - the page is present in RAM and has a refcount>1 > ==> add it to the process total size and to the shared portion > > - the page is not present in RAM > ==> add it to the process total size > > The point here is, for a page that is not present Linux cannot read the > refcount because that count is also not present in RAM. So, to decide if a > page is shared or not it would have to read in the page. This is too expensive > an operation only to read the refcount. > > So, while in theory a page is either used by only one process and hence > private or by multiple and hence shared in practice we have > > total_size = private + shared + notpresent > > where notpresent is either shared or private, we cannot know. > > How processes are created? > ========================== > > Under Linux a process is create by the fork() or clone() system calls. In > theory the operating system duplicates the complete address space of the > process calling fork. One copy belongs to the original process (the parent) > the other is for the new process (the child). > > But if we really had to copy the whole address space fork() would be a really > expensive operation. In fact, only a table holding pointers to the pages that > comprise the parent's address space is duplicated. And all pages are marked > read-only and their reference count is incremented. > > Now, if one of the processes wants to write to a page the CPU again generates > an interrupt because the page is marked as read-only. The operating system > catches that interrupt. And only now the actual page is duplicated. One page > for the writing process and one for the others. The refcount of the new page > becomes 1 that of the old is decremented. This working pattern is called copy- > on-write. > > With the apache web server we have one parent process that spawns many > children. At first, almost all of the child's address space is shared with the > parent due to copy-on-write. Over its lifetime the child's private address > space grows by 2 means: > > * it allocates more memory > ==> total_size grows, unshared grows but shared stays the same. > * it writes to portions that were initially shared with the parent > ==> unshared grows, shared shrinks but total_size does not change. > > Now, what is the goal of Apache::SizeLimit? > =========================================== > > If the overall working set of all apache children becomes larger than the > available RAM the system then the operating system has to fetch from the disk > code and/or data for each request and by doing so it has to evict pages that > will be needed by the next request shortly after. > > Apache::SizeLimit (ASL hereafter) tries to avoid this situation. > > Note, there is no problem with large process sizes and heavy swap space usage > if the data remains there and is normally not used. > > ASL can monitor a few values per apache child, the total size of the process, > the RSS portion, the portion that is reported as shared and the private part. > > As for the "unshared" value above it can be defined as: > (all rvalues are reported by /proc/$PID/smaps) > > 1) unshared = size - shared > > 2) unshared = rss - shared > > 3) unshared = private > this is just the same as 2) because rss = shared + private > > What are the implications? > ========================== > > 1) since size = shared + private + notpresent unshared becomes > > unshared = private + notpresent > > Of notpresent we know nothing for sure. It is a mix of shared and private > pages. > > Now, if an administrator turns off swapping (swapoff /dev/...) the part of > notpresent that has been there becomes present. So, we can expect shared to > grow considerably. unshared will shrink by the same amount. > > As for absolute values, unshared in this case is quite large a number because > it is on top of notpresent. > > 2) here unshared lacks the notpresent part. So the actual number is much less > than it would be in case 1. > > But if an administrator turns off swapping now a part of notpresent will be > added to unshared. unshared may suddenly jump over the limit in all apache > children. > > Well, an administrator doing that on a busy web server should be converted > into a httpd ... > > So, I am quite undecided what to do. > > Please comment! > > See also > http://foertsch.name/ModPerl-Tricks/Measuring-memory-consumption/index.shtml > > > What else could be done to hit ASL's goal? > ========================================== > > There are a few other status fields that can be possibly used: > > - "Swap" in /proc/$PID/smaps > don't know for sure what that means but sounds good. Need to inspect the > kernel code > > - "Referenced" in /proc/$PID/smaps > can be used to find out how many RAM a process has accessed since the last > reset of the counter. We could reset it in PerlPostReadRequestHandler and > read in a $r->pool cleanup. > > - "Pss" in /proc/$PID/smaps > segment size divided by the refcount > > - "VmSwap" in /proc/$PID/status > for example: terminate if the process starts to use swap space > > Certainly more. > > Torsten Förtsch > > -- > Need professional modperl support? Hire me! (http://foertsch.name) > > Like fantasy? http://kabatinte.net >