apr pools memory leaks
I have interesting memory leak data to share with these two lists (crossposting to both svn and apr dev lists). Ever since we launched svn-on-bigtable over at Google (about 2 years ago), we've been struggling with mysterious memory leaks in apache -- very similar to what users are complaining about in Subversion issue 3084. After lots of analysis, here's what we've figured out so far. Symptom: When you have a process that runs for a very long time while making use of APR pools, the global pool tends to fragment into tiny pieces, and APR just keeps on malloc()ing without ever calling free(). In other words, a guaranteed long-and-slow leak. Most people don't notice this problem with httpd, because they run httpd in prefork mode: a bunch of httpd processes that only serve 1000 requests, then die and get re-spawned. They never live long enough to exhibit the leak. But if you run apache in threaded mode, and let the same apache run for days and weeks, it leaks a *lot*. Cause: If you look at APR's pool code, you can see the main reason for fragmentation. In a nutshell, it never recombines recycled memory. For example, suppose over an hour I create 20 subpools each 5k in size, then apr_pool_destroy() them in turn. APR then places these blocks into a 'free memory' list for future recycling. If I then create a new subpool that requires 3k, no problem -- APR gives me back one of the existing 5k blocks to use. But if I create a subpool that requires 20k, whoops, it just goes and malloc()s 20k from the OS, rather than combining four adjacent blocks from the 'free' list. Our solution: Over at Google, we simply hacked APR to *never* hold on to blocks for recycling. Essentially, this makes apr_pool_destroy() always free() the block, and makes apr_pool_create() always call malloc() malloc. Poof, all the memory leak went away instantly. What was more troubling is that the use of the MaxMemFree directive -- which is supposed to limit the total size of the 'free memory' recycling list -- didn't seem to work for us. What we need to do is go back and debug this more carefully, and see if it's a bug in APR, apache, or just in our testing methodology. But I think there's still got to be something wrong with MaxMemFree, since users are claiming it's not working for them in issue 3084. Something is fishy. We plan to look into it more, but since users are screaming, maybe someone else can beat us to it... In the long term, I think we need to question the utility of having APR do memory recycling at all. Back in the early 90's, malloc() was insanely slow and worth avoiding. In 2008, now that we're running apache with nothing but malloc/free, we're unable measure any performance hit. The whole pool interface is really nice, but I wonder if pool recycling may just be unnecessary on modern hardware and OSes.
Re: apr pools memory leaks
On Wed, Oct 1, 2008 at 2:11 PM, Ben Collins-Sussman [EMAIL PROTECTED] wrote: I have interesting memory leak data to share with these two lists (crossposting to both svn and apr dev lists). Ever since we launched svn-on-bigtable over at Google (about 2 years ago), we've been struggling with mysterious memory leaks in apache -- very similar to what users are complaining about in Subversion issue 3084. After lots of analysis, here's what we've figured out so far. It is good to see some analysis on this issue. Here is link BTW: http://subversion.tigris.org/issues/show_bug.cgi?id=3084 A couple questions: 1) This seems to happen only with Apache 2.2 and not 2.0. Is there any explanation for that supported by your analysis? 2) It seems like many of the people, at least on Windows, can reproduce this problem quickly. Could this just be due to running requests which create/destroy a lot of memory? 3) Any reason more Windows users would see this than Linux? Maybe more Windows SVN users use Apache 2.2 than on Linux? -- Thanks Mark Phippard http://markphip.blogspot.com/
Re: apr pools memory leaks
On Wed, Oct 1, 2008 at 8:31 PM, Mark Phippard [EMAIL PROTECTED] wrote: On Wed, Oct 1, 2008 at 2:11 PM, Ben Collins-Sussman [EMAIL PROTECTED] wrote: I have interesting memory leak data to share with these two lists (crossposting to both svn and apr dev lists). Ever since we launched svn-on-bigtable over at Google (about 2 years ago), we've been struggling with mysterious memory leaks in apache -- very similar to what users are complaining about in Subversion issue 3084. After lots of analysis, here's what we've figured out so far. It is good to see some analysis on this issue. Here is link BTW: http://subversion.tigris.org/issues/show_bug.cgi?id=3084 A couple questions: 1) This seems to happen only with Apache 2.2 and not 2.0. Is there any explanation for that supported by your analysis? 2) It seems like many of the people, at least on Windows, can reproduce this problem quickly. Could this just be due to running requests which create/destroy a lot of memory? 3) Any reason more Windows users would see this than Linux? Maybe more Windows SVN users use Apache 2.2 than on Linux? Windows doesn't support prefork mode; only threaded operation. On Linux/Unix the default mode of operation of Apache is some sort of creation of disposable processes. The threaded operation in Windows doesn't have that (a disposable process which cleans up any memory management issues). Bye, Erik.
Re: apr pools memory leaks
On Wed, Oct 1, 2008 at 2:39 PM, Erik Huelsmann [EMAIL PROTECTED] wrote: Windows doesn't support prefork mode; only threaded operation. On Linux/Unix the default mode of operation of Apache is some sort of creation of disposable processes. The threaded operation in Windows doesn't have that (a disposable process which cleans up any memory management issues). OK. The way I read Ben's email is that the reason you do not see this in plain Apache was that it is usually run in pre-fork. I thought it was possible in SVN regardless. It sounds like when SVN is used in an Apache that is using pre-fork these processes are being cleaned up regularly which frees memory. -- Thanks Mark Phippard http://markphip.blogspot.com/
Re: apr pools memory leaks
On Wed, Oct 1, 2008 at 1:31 PM, Mark Phippard [EMAIL PROTECTED] wrote: On Wed, Oct 1, 2008 at 2:11 PM, Ben Collins-Sussman [EMAIL PROTECTED] wrote: I have interesting memory leak data to share with these two lists (crossposting to both svn and apr dev lists). Ever since we launched svn-on-bigtable over at Google (about 2 years ago), we've been struggling with mysterious memory leaks in apache -- very similar to what users are complaining about in Subversion issue 3084. After lots of analysis, here's what we've figured out so far. It is good to see some analysis on this issue. Here is link BTW: http://subversion.tigris.org/issues/show_bug.cgi?id=3084 A couple questions: 1) This seems to happen only with Apache 2.2 and not 2.0. Is there any explanation for that supported by your analysis? As far as I know, this is an APR issue, not an Apache issue... and I don't think the pool code has changed for at least 6 or 7 years...? 2) It seems like many of the people, at least on Windows, can reproduce this problem quickly. Could this just be due to running requests which create/destroy a lot of memory? Definitely. A single checkout causes zillions of subpools to be repeatedly created and destroyed. Just look at all the looping constructs in libsvn_fs! If you run apache in prefork mode, you won't see this problem -- no apache process lasts very long. If you run apache in threaded (mpm) mode, the apache process runs forever, and the leak becomes obvious. 3) Any reason more Windows users would see this than Linux? Maybe more Windows SVN users use Apache 2.2 than on Linux? As Erik said, on Windows only the threaded mode is available, thus explaining why they're seeing this problem more than anyone else.
Re: apr pools memory leaks
On Wed, Oct 1, 2008 at 2:47 PM, Ben Collins-Sussman [EMAIL PROTECTED] wrote: 2) It seems like many of the people, at least on Windows, can reproduce this problem quickly. Could this just be due to running requests which create/destroy a lot of memory? Definitely. A single checkout causes zillions of subpools to be repeatedly created and destroyed. Just look at all the looping constructs in libsvn_fs! If you run apache in prefork mode, you won't see this problem -- no apache process lasts very long. If you run apache in threaded (mpm) mode, the apache process runs forever, and the leak becomes obvious. That's not entirely accurate. Many of the threaded MPMs (i.e. Worker) use multiple subprocesses with multiple threads each, and restart each subprocess periodically. The configuration used at Google happens to limit it to one subprocess with many threads, and doesn't restart it periodically, but that's not the default configuration IIRC. -garrett
Re: apr pools memory leaks
Ben Collins-Sussman wrote: On Wed, Oct 1, 2008 at 1:31 PM, Mark Phippard [EMAIL PROTECTED] wrote: 3) Any reason more Windows users would see this than Linux? Maybe more Windows SVN users use Apache 2.2 than on Linux? As Erik said, on Windows only the threaded mode is available, thus explaining why they're seeing this problem more than anyone else. Also, most distributions shipped prefork as the default mpm back in 2.0 and now ship worker as the default with 2.2. winnt mpm looks much more like worker, of course.