Hi all,
I (together with my co-authors, Ben Zorn & Kathryn McKinley) have just
submitted a paper to OOPSLA which discusses lots of custom allocation
strategies, including pools (more commonly known as regions). It
explains the benefits of pools, along with some of their drawbacks & a
proposed solution. Here's the abstract:
Programmers hoping to achieve performance improvements often use
custom memory allocators. This in-depth study examines eight
applications that use custom allocators. Surprisingly, for six of
these applications, a state-of-the-art general-purpose allocator
performs as well as or better than the custom allocators. The two
exceptions use regions, which deliver higher performance (improvements
of up to 44\%). Regions also reduce programmer burden and eliminate a
source of memory leaks. However, we show that the inability of
programmers to free individual objects within regions can lead to a
substantial increase in memory consumption. Worse, this limitation
precludes the use of regions in common programming idioms, reducing
their usefulness.
We present a generalization of general-purpose and region-based
allocators that we call {\em reaps}. Reaps are a combination of
regions and heaps, providing a full range of region semantics with the
addition of individual object deletion. We show that our
implementation of reaps provides high performance, outperforming other
allocators with region-like semantics. Our results indicate that most
programmers needing faster memory allocators should use a better
general-purpose allocator rather than writing a custom allocator, and
that programmers needing regions should instead use reaps.
A pre-print of the paper is available at:
http://www.cs.utexas.edu/users/emery/download.cgi?location=custom.pdf
--
Emery Berger
Dept. of Computer Science
The University of Texas at Austin
www.cs.utexas.edu/users/emery
> -----Original Message-----
> From: Karl Fogel [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 28, 2002 7:34 PM
> To: Jay Freeman (saurik)
> Cc: apr-dev
> Subject: Re: any documentation on the point of having pools?
>
> Jay,
>
> I can partially answer your question.
>
> Let's say there are three kinds of memory allocation in the world:
>
> 1. raw -- you know, like C malloc() and free()
> 2. pools
> 3. fully garbage-collected
>
> For the programmer, full GC is ideal. Unfortunately, it takes time
> for the GC code to figure out what's garbage and what's not, and to
> free it. Or if that phase is to be instantaneous, then there must be
> lots of little bits of overhead scattered all around, since all
> allocations will be required to do some GC bookkeeping. Usually GC is
> implemented with a mixture of these two strategies, but they total up
> to the same penalty, speaking *very* broadly of course.
>
> I don't want to get into one of programming's longest-running debates,
> but let's just say that despite occasional claims that full GC can, in
> principle, be just as efficient as "raw" allocation methods, in
> practice it never has been, and looks unlikely to be so in the near
> future. There are also some issues when it comes to interacting with
> non-GC'd languages, as you might expect. Not knocking GC -- my
> far-and-away favorite language is Lisp -- but GC comes with a penalty.
>
> Anyway, APR is written in C, and that's actually an important part of
> its design as a portability layer. So full GC would be technically,
> uh, difficult under the circumstances, even without considering the
> performance hit. :-)
>
> So let's look at the remaining two options: raw vs pools.
>
> Some programmers find pools easier to work with, some prefer raw
> allocation. We'll probably never get agreement on that.
>
> However, there is one nice thing about pools: they can fulfill the
> promise that GC never did -- the promise of being more efficient than
> malloc() and free(). The reason is that in raw-style allocation,
> every malloc() call must have a matching free() call. But a pool can
> clean up multiple mallocs() with one free(). I'm not talking about
> literal "malloc" calls, of course, but just the act of allocating
> something in that pool; and by "free" I mean apr_pool_clear or
> apr_pool_destroy, but you get the idea. Pool bookkeeping is done in
> such a way that you can mark the whole pool as reclaimable in one
> essentially constant-time operation, independent of how many objects
> (of whatever lengths) you may have allocated in that pool.
>
> Aside from the efficiency aspect (which I suspect is not so great as
> to be a major motivation, perhaps Sander or someone can comment?),
> people who like pools like them because they give a convenient idiom
> for expressing the lifetimes of objects. If you have a run of code
> that's going to cons up [er, excuse me, allocate] some objects, all of
> which need to remain valid for the duration of a certain set of
> operations, it's handy to put them all in the same pool, and just
> destroy the pool at the end. When the same code is written using raw
> allocation, it usually flaunts a dozen calls to free() at the end, and
> when you add a new object to that run of code, it's easy to forget to
> add yet another call to free(). Note that in the pool style, it's
> usually easy to see which pool you're supposed to allocate the thing
> in, or at least the presence of multiple pools there will force you to
> ask yourself about the object's lifetime, which malloc won't.
>
> Wow, I can't believe I stopped coding to write this :-). I hope it's
> at least technically accurate (fixes welcome!), if not persuasive.
> For the record, I like pools when I don't hate them.
>
> -Karl
>
> "Jay Freeman \(saurik\)" <[EMAIL PROTECTED]> writes:
> > Is there any documentation anywhere that describes "why you would
want
> to
> > use pools"? I've been using APR for over a year now in virtually
all of
> my
> > projects, and I _still_ don't get what the advantage of this pool
> management
> > that's strewn all over my programs is. I finally got fed up, wrote
a
> C++
> > class named "pool" (with an autocast operator for getting an
apr_pool_t
> *
> > and a destructor that destroys the pool), and have an instance of it
in
> > _every APR related object_ so I have something I can pass to the APR
> > functions when they scream out for their precious pools :-P. I pray
at
> > nights that I'm not using an insane amount of working set by doing
this,
> > hehe.
> >
> > [...]