Memory management and performance improvements, was: Support for latest Axis2/C and Python runtimes

Jean-Sebastien Delfino Wed, 06 Jan 2010 00:37:32 -0800

Jean-Sebastien Delfino wrote:

I'm on vacation, so if I find time in the next few days I'm going to dosome coding in sca-cpp

...

I'd also like to try and compare sca-cpp's performance with the httpdmulti threaded 'worker' and 'prefork' modules... not sure I'll get to itthough, as I have a lot of other non computer related stuff planned forthe vacation :)

So I found some time over the holiday break to experiment with differentthreading and memory management schemes, and was able to make a fewimprovements to the C++ SCA runtime.

The SCA runtime now works with both the pre-fork (pool of mono-threadprocesses) and worker (pool of multi-threaded processes) HTTPD MPMs.

That way you can get both super-fast dispatch in multi-threaded HTTPDnodes (dedicated to SCA component wiring/routing for example), androbustness against application failures in pre-forked HTTPD nodesrunning application code.

To speed up multi-threaded execution, there's no mutex locks anymore onthe main processing path. To eliminate these locks I've done two things:

- Replaced the reference counting scheme I was using before for memorycollection (which required locks around the ref counters) by a poolbased scheme using HTTPD/APR pools (with fast sequential allocs from apool private to the HTTP request processed by a given thread, no frees,until HTTPD frees the whole pool at once after processing the request.)

- Added simple memory pool based and lock free implementations of stringand input/output streams, replacing the slower STL string and streamequivalents (which use locks around their reference counters as well.)

With these changes, performance is now getting really good in bothpre-fork and multi-threaded worker HTTPD servers.

On my home Ubuntu server (Core Duo 2.66 Ghz) a plain HTTP static GETloopback takes 0.16 msec (tested with a loop hitting a multi-threadedHTTPD in 10 concurrent threads).

An ATOM POST to an SCA component, with SCA wiring/routing to thecomponent, parsing of the ATOM XML entry, invocation of the componentimplementation (written in Python or Scheme, that doesn't seem to make adifference) takes 0.19 sec.

So that's 0.03 msec for the SCA runtime, on top of the HTTPD 0.16 msecbaseline... getting pretty fast I think :)

Next, I'd like to measure invocation of SCA components written in C++,and see how the runtime can fly with streaming big payloads (as the newstring and stream implementations should also help minimize memory copies.)


--
Jean-Sebastien

Memory management and performance improvements, was: Support for latest Axis2/C and Python runtimes

Reply via email to