Justin Erenkrantz wrote:
On Thu, Jul 26, 2001 at 10:50:58PM -0700, Brian Pane wrote:
But there's a problem with the SMS lock management. According to gprof, every call to apr_sms_trivial_malloc acquires and releases a lock.
Yup. I'm working on this right now. =) (What I have in my tree right now isn't in commit shape.) Email me and I can send you
my tree diff.
I'm debating whether we're getting a win by the SMS stuff or not. In order to do that, we need to kick out every lock we can.
Yep, I've been wondering the same thing lately. The original pool implementation of apr_palloc is very close to optimal. In the last benchmark that I did with non-mod_include requests, 99.998% of the calls to apr_palloc didn't require locking. So it's going to be tough to improve upon the original design.
The biggest opportunity that I see for optimization in the memory-management framework is in sub-pool creation. If an SMS-based implementation can reduce the cost of creating sub-pools, it will speed up mod_include.
But first, we may have a more fundamental problem:
Right now, I've got it so that most of the locks are now in libc
(aka NIMBY), but the performance still doesn't match pools (by a
factor of 2). I'm scratching my head as to why this is.
hmmm...looking at the code, it makes sense that SMS is half as fast as the original pools code. I didn't realize this until just now, but the polymorphism in the SMS framework will probably make it impossible to match the performance of pools:
* apr_palloc (the original pools version) is a very lightweight function in the fast-path case where it doesn't need to acquire a lock. It consists of a couple of integer/pointer arithmetic operations and two comparisons.
* The SMS-based implementation has to do essentially the same work, but it also does an extra function call (apr_sms_malloc calls apr_sms_trivial_malloc).
* If the cost of a function call is similar to the cost of the two comparisons and half-dozen arithmetic operations in apr_palloc, that would explain why the SMS version is twice as slow.
-- justin
P.S. You are using gprof, how? I tried -pg and it just doesn't work. I switched to Forte 6.0U1's collect program now. It actually writes out info that I can use (er_print is a bit awkward though).
I'm using gcc on Linux to build profiled code; it's not properly including a profiled libc for reasons that I haven't had time to debug yet, but it does a decent job of instrumenting the apr and httpd code.
--Brian
