Yeah, I saw your tweet… This is really cool and I never bothered to check raw req/per/sec stuff for my acna11 talk because, as it was said in the trafficserver talk, rps is ridiculous anyway ;)
Would love to see this folded into trunk and even 2.4.0 if we have the time and inclination. When I get back to the office (still in SF), I hope to commit some Simple patches that I've been playing around with. On Nov 13, 2011, at 4:24 PM, Paul Querna wrote: > I've created a branch event-performance with this change and several > other changes: > <https://github.com/pquerna/httpd/compare/trunk...event-performance> > > I did some basic benchmarking to validate it, though if anyone has a > real test lab setup that can throw huge traffic numbers at it that > would be very helpful. > > For the "It works" default index page, I got the following: > > event mpm trunk: 15210.07 req/second > event mpm performance branch: 15775.42 req/second (~4%) > nginx 0.7.65-1ubuntu2: 12070.35 > > Event MPM was using a 100% default install configuration, nginx was > using Ubuntu 10.04.3 LTS default configuration, baring replacement of > the index.html to match Apache's. I included nginx just to make sure > the we were in the same ballparks, I imagine a well tuned nginx config > can go much higher, but the same is true of Apache too :) > > As an aside, In either case, using apachebench for this is kinda.. > pointless, it's maxing itself out on a single core, and the machine > is... idle. I'd be very happy if someone wrote a new benchmarking > tool and/or revamped flood. sigh. > > Will try to do some more profiling and then get this into trunk in the > next day or two. > > On Fri, Nov 11, 2011 at 8:07 PM, Paul Querna <[email protected]> wrote: >> hi, >> >> After r1201149, we now lock for lots of things, where in an ideal >> case, we shouldn't need it. >> >> I'm toying around with ideas on how to eliminate the need for a mutex at all. >> >> My current 'best' idea I think: >> >> 1) Create a new struct, ap_pollset_operation_and_timeout_info_t, which >> contains a what pollset operation to do (Add, Remove, Flags), >> timeout_type, the timeout value, and a pointer to the conn_rec. >> >> 2) Implement a single-reader single writer circular ring buffer of >> these structures. (Using 2x uint16_t head/end offsets stuffed into a >> single uint32_t so we can make it portability atomic using >> apr_atomic_cas32) >> >> 3) Allocate this ring buffer per-worker. >> >> 4) Have the single Event thread de-queue operations from all the worker >> threads. >> >> This would remove the 4 or 5 separate timeout queues we have >> developed, and their associated mutex, and basically move all of the >> apr_pollset operations to the single main thread. >> >> Without modification to the event mpm, it would potentially cause some >> issues as the event thread isn't always waking up that often, but I >> think we can figure out a way to do this without too much pain (either >> a pipe trigger to wake up when in 'slow mode', or just lower the >> default timeout on the pollset form 500ms to like 5ms). >> >> Thoughts? >> >> Thanks, >> >> Paul >> >
