apr pools memory leaks

2008-10-01 Thread Ben Collins-Sussman
I have interesting memory leak data to share with these two lists
(crossposting to both svn and apr dev lists).

Ever since we launched svn-on-bigtable over at Google (about 2 years
ago), we've been struggling with mysterious memory leaks in apache --
very similar to what users are complaining about in Subversion issue
3084.

After lots of analysis, here's what we've figured out so far.


Symptom:

When you have a process that runs for a very long time while making
use of APR pools, the global pool tends to fragment into tiny pieces,
and APR just keeps on malloc()ing without ever calling free().  In
other words, a guaranteed long-and-slow leak.

Most people don't notice this problem with httpd, because they run
httpd in prefork mode: a bunch of httpd processes that only serve 1000
requests, then die and get re-spawned.  They never live long enough
to exhibit the leak.  But if you run apache in threaded mode, and let the
same apache run for days and weeks, it leaks a *lot*.


Cause:

If you look at APR's pool code, you can see the main reason for
fragmentation.  In a nutshell, it never recombines recycled memory.
For example, suppose over an hour I create 20 subpools each 5k in
size, then apr_pool_destroy() them in turn.  APR then places these
blocks into a 'free memory' list for future recycling.  If I then
create a new subpool that requires 3k, no problem -- APR gives me back
one of the existing 5k blocks to use.  But if I create a subpool that
requires 20k, whoops, it just goes and malloc()s 20k from the OS,
rather than combining four adjacent blocks from the 'free' list.


Our solution:

Over at Google, we simply hacked APR to *never* hold on to blocks for
recycling.  Essentially, this makes apr_pool_destroy() always free()
the block, and makes apr_pool_create() always call malloc() malloc.
Poof, all the memory leak went away instantly.

What was more troubling is that the use of the MaxMemFree directive --
which is supposed to limit the total size of the 'free memory'
recycling list -- didn't seem to work for us.  What we need to do is
go back and debug this more carefully, and see if it's a bug in APR,
apache, or just in our testing methodology.

But I think there's still got to be something wrong with MaxMemFree,
since users are claiming it's not working for them in issue 3084.
Something is fishy.  We plan to look into it more, but since users are
screaming, maybe someone else can beat us to it...

In the long term, I think we need to question the utility of having
APR do memory recycling at all.  Back in the early 90's, malloc() was
insanely slow and worth avoiding.  In 2008, now that we're running
apache with nothing but malloc/free, we're unable measure any
performance hit.  The whole pool interface is really nice, but I
wonder if pool recycling may just be unnecessary on modern hardware
and OSes.


Re: apr pools memory leaks

2008-10-01 Thread Mark Phippard
On Wed, Oct 1, 2008 at 2:11 PM, Ben Collins-Sussman
[EMAIL PROTECTED] wrote:
 I have interesting memory leak data to share with these two lists
 (crossposting to both svn and apr dev lists).

 Ever since we launched svn-on-bigtable over at Google (about 2 years
 ago), we've been struggling with mysterious memory leaks in apache --
 very similar to what users are complaining about in Subversion issue
 3084.

 After lots of analysis, here's what we've figured out so far.

It is good to see some analysis on this issue.  Here is link BTW:

http://subversion.tigris.org/issues/show_bug.cgi?id=3084

A couple questions:

1) This seems to happen only with Apache 2.2 and not 2.0.  Is there
any explanation for that supported by your analysis?

2) It seems like many of the people, at least on Windows, can
reproduce this problem quickly.  Could this just be due to running
requests which create/destroy a lot of memory?

3) Any reason more Windows users would see this than Linux?  Maybe
more Windows SVN users use Apache 2.2 than on Linux?

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/


Re: apr pools memory leaks

2008-10-01 Thread Erik Huelsmann
On Wed, Oct 1, 2008 at 8:31 PM, Mark Phippard [EMAIL PROTECTED] wrote:
 On Wed, Oct 1, 2008 at 2:11 PM, Ben Collins-Sussman
 [EMAIL PROTECTED] wrote:
 I have interesting memory leak data to share with these two lists
 (crossposting to both svn and apr dev lists).

 Ever since we launched svn-on-bigtable over at Google (about 2 years
 ago), we've been struggling with mysterious memory leaks in apache --
 very similar to what users are complaining about in Subversion issue
 3084.

 After lots of analysis, here's what we've figured out so far.

 It is good to see some analysis on this issue.  Here is link BTW:

 http://subversion.tigris.org/issues/show_bug.cgi?id=3084

 A couple questions:

 1) This seems to happen only with Apache 2.2 and not 2.0.  Is there
 any explanation for that supported by your analysis?

 2) It seems like many of the people, at least on Windows, can
 reproduce this problem quickly.  Could this just be due to running
 requests which create/destroy a lot of memory?

 3) Any reason more Windows users would see this than Linux?  Maybe
 more Windows SVN users use Apache 2.2 than on Linux?

Windows doesn't support prefork mode; only threaded operation. On
Linux/Unix the default mode of operation of Apache is some sort of
creation of disposable processes. The threaded operation in Windows
doesn't have that (a disposable process which cleans up any memory
management issues).

Bye,

Erik.


Re: apr pools memory leaks

2008-10-01 Thread Mark Phippard
On Wed, Oct 1, 2008 at 2:39 PM, Erik Huelsmann [EMAIL PROTECTED] wrote:

 Windows doesn't support prefork mode; only threaded operation. On
 Linux/Unix the default mode of operation of Apache is some sort of
 creation of disposable processes. The threaded operation in Windows
 doesn't have that (a disposable process which cleans up any memory
 management issues).

OK.  The way I read Ben's email is that the reason you do not see this
in plain Apache was that it is usually run in pre-fork.  I thought
it was possible in SVN regardless.  It sounds like when SVN is used in
an Apache that is using pre-fork these processes are being cleaned up
regularly which frees memory.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/


Re: apr pools memory leaks

2008-10-01 Thread Ben Collins-Sussman
On Wed, Oct 1, 2008 at 1:31 PM, Mark Phippard [EMAIL PROTECTED] wrote:
 On Wed, Oct 1, 2008 at 2:11 PM, Ben Collins-Sussman
 [EMAIL PROTECTED] wrote:
 I have interesting memory leak data to share with these two lists
 (crossposting to both svn and apr dev lists).

 Ever since we launched svn-on-bigtable over at Google (about 2 years
 ago), we've been struggling with mysterious memory leaks in apache --
 very similar to what users are complaining about in Subversion issue
 3084.

 After lots of analysis, here's what we've figured out so far.

 It is good to see some analysis on this issue.  Here is link BTW:

 http://subversion.tigris.org/issues/show_bug.cgi?id=3084

 A couple questions:

 1) This seems to happen only with Apache 2.2 and not 2.0.  Is there
 any explanation for that supported by your analysis?

As far as I know, this is an APR issue, not an Apache issue... and I
don't think the pool code has changed for at least 6 or 7 years...?



 2) It seems like many of the people, at least on Windows, can
 reproduce this problem quickly.  Could this just be due to running
 requests which create/destroy a lot of memory?

Definitely.  A single checkout causes zillions of subpools to be
repeatedly created and destroyed.   Just look at all the looping
constructs in libsvn_fs!

If you run apache in prefork mode, you won't see this problem -- no
apache process lasts very long.

If you run apache in threaded (mpm) mode, the apache process runs
forever, and the leak becomes obvious.


 3) Any reason more Windows users would see this than Linux?  Maybe
 more Windows SVN users use Apache 2.2 than on Linux?

As Erik said, on Windows only the threaded mode is available, thus
explaining why they're seeing this problem more than anyone else.


Re: apr pools memory leaks

2008-10-01 Thread Garrett Rooney
On Wed, Oct 1, 2008 at 2:47 PM, Ben Collins-Sussman
[EMAIL PROTECTED] wrote:

 2) It seems like many of the people, at least on Windows, can
 reproduce this problem quickly.  Could this just be due to running
 requests which create/destroy a lot of memory?

 Definitely.  A single checkout causes zillions of subpools to be
 repeatedly created and destroyed.   Just look at all the looping
 constructs in libsvn_fs!

 If you run apache in prefork mode, you won't see this problem -- no
 apache process lasts very long.

 If you run apache in threaded (mpm) mode, the apache process runs
 forever, and the leak becomes obvious.

That's not entirely accurate.  Many of the threaded MPMs (i.e. Worker)
use multiple subprocesses with multiple threads each, and restart each
subprocess periodically.  The configuration used at Google happens to
limit it to one subprocess with many threads, and doesn't restart it
periodically, but that's not the default configuration IIRC.

-garrett


Re: apr pools memory leaks

2008-10-01 Thread William A. Rowe, Jr.
Ben Collins-Sussman wrote:
 On Wed, Oct 1, 2008 at 1:31 PM, Mark Phippard [EMAIL PROTECTED] wrote:
 
 3) Any reason more Windows users would see this than Linux?  Maybe
 more Windows SVN users use Apache 2.2 than on Linux?
 
 As Erik said, on Windows only the threaded mode is available, thus
 explaining why they're seeing this problem more than anyone else.

Also, most distributions shipped prefork as the default mpm back in 2.0
and now ship worker as the default with 2.2.  winnt mpm looks much more
like worker, of course.