Jean-Sebastien Delfino wrote:
I'm on vacation, so if I find time in the next few days I'm going to do some coding in sca-cpp
...


I'd also like to try and compare sca-cpp's performance with the httpd multi threaded 'worker' and 'prefork' modules... not sure I'll get to it though, as I have a lot of other non computer related stuff planned for the vacation :)


So I found some time over the holiday break to experiment with different threading and memory management schemes, and was able to make a few improvements to the C++ SCA runtime.

The SCA runtime now works with both the pre-fork (pool of mono-thread processes) and worker (pool of multi-threaded processes) HTTPD MPMs.

That way you can get both super-fast dispatch in multi-threaded HTTPD nodes (dedicated to SCA component wiring/routing for example), and robustness against application failures in pre-forked HTTPD nodes running application code.

To speed up multi-threaded execution, there's no mutex locks anymore on the main processing path. To eliminate these locks I've done two things:

- Replaced the reference counting scheme I was using before for memory collection (which required locks around the ref counters) by a pool based scheme using HTTPD/APR pools (with fast sequential allocs from a pool private to the HTTP request processed by a given thread, no frees, until HTTPD frees the whole pool at once after processing the request.)

- Added simple memory pool based and lock free implementations of string and input/output streams, replacing the slower STL string and stream equivalents (which use locks around their reference counters as well.)

With these changes, performance is now getting really good in both pre-fork and multi-threaded worker HTTPD servers.

On my home Ubuntu server (Core Duo 2.66 Ghz) a plain HTTP static GET loopback takes 0.16 msec (tested with a loop hitting a multi-threaded HTTPD in 10 concurrent threads).

An ATOM POST to an SCA component, with SCA wiring/routing to the component, parsing of the ATOM XML entry, invocation of the component implementation (written in Python or Scheme, that doesn't seem to make a difference) takes 0.19 sec.

So that's 0.03 msec for the SCA runtime, on top of the HTTPD 0.16 msec baseline... getting pretty fast I think :)

Next, I'd like to measure invocation of SCA components written in C++, and see how the runtime can fly with streaming big payloads (as the new string and stream implementations should also help minimize memory copies.)

--
Jean-Sebastien

Reply via email to