Justin Erenkrantz wrote:

On Thu, Jul 26, 2001 at 10:50:58PM -0700, Brian Pane wrote:

But there's a problem with the SMS lock management.
According to gprof, every call to apr_sms_trivial_malloc
acquires and releases a lock.


Yup. I'm working on this right now. =) (What I have in my tree right now isn't in commit shape.) Email me and I can send you
my tree diff.


I'm debating whether we're getting a win by the SMS stuff or not.
In order to do that, we need to kick out every lock we can.

Yep, I've been wondering the same thing lately.
The original pool implementation of apr_palloc is
very close to optimal.  In the last benchmark that
I did with non-mod_include requests, 99.998% of the
calls to apr_palloc didn't require locking.  So it's
going to be tough to improve upon the original design.

The biggest opportunity that I see for optimization
in the memory-management framework is in sub-pool
creation.  If an SMS-based implementation can reduce
the cost of creating sub-pools, it will speed up
mod_include.

But first, we may have a more fundamental problem:

Right now, I've got it so that most of the locks are now in libc
(aka NIMBY), but the performance still doesn't match pools (by a
factor of 2). I'm scratching my head as to why this is.


hmmm...looking at the code, it makes sense that SMS is
half as fast as the original pools code.  I didn't realize
this until just now, but the polymorphism in the SMS framework
will probably make it impossible to match the performance of pools:

* apr_palloc (the original pools version) is a very lightweight
 function in the fast-path case where it doesn't need to
 acquire a lock.  It consists of a couple of integer/pointer
 arithmetic operations and two comparisons.

* The SMS-based implementation has to do essentially the same
 work, but it also does an extra function call (apr_sms_malloc
 calls apr_sms_trivial_malloc).

* If the cost of a function call is similar to the cost of
 the two comparisons and half-dozen arithmetic operations
 in apr_palloc, that would explain why the SMS version is
 twice as slow.

-- justin

P.S. You are using gprof, how?  I tried -pg and it just doesn't
work.  I switched to Forte 6.0U1's collect program now.  It
actually writes out info that I can use (er_print is a bit
awkward though).

I'm using gcc on Linux to build profiled code; it's not properly
including a profiled libc for reasons that I haven't had time to
debug yet, but it does a decent job of instrumenting the apr and
httpd code.

--Brian




Reply via email to