Hi Sam,

>  Processes 1, 2, 3 are running.  1 finishes and requests the mutex, then
>  2 finishes and requests the mutex, then 3 finishes and requests the mutex.
>  So when the next three requests come in, they are handled in the same order:
>  1, then 2, then 3 - this is FIFO or LRU.  This is bad for performance.

Thanks for the explanation; that makes sense now.  So, I was right that
it's OS dependent, but most OSes use a FIFO approach which leads to LRU
selection in the mutex.

Unfortunately, I don't see that being fixed very simply, since it's not
really Apache doing the choosing.  Maybe it will be possible to do
something cool with the wake-one stuff in Linux 2.4 when that comes out.

By the way, how are you doing it?  Do you use a mutex routine that works
in LIFO fashion?

>  > In my experience running
>  > development servers on Linux it always seemed as if the the requests
>  > would continue going to the same process until a request came in when
>  > that process was already busy.
> 
>  No, they don't.  They go round-robin (or LRU as I say it).

Keith Murphy pointed out that I was seeing the result of persistent HTTP
connections from my browser.  Duh.

>  But a point I'm making is that with mod_perl you have to go to great
>  lengths to write your code so as to avoid unshared memory.  My claim is that
>  with mod_speedycgi you don't have to concern yourself as much with this.
>  You can concentrate more on the application and less on performance tuning.

I think you're overstating the case a bit here.  It's really easy to take
advantage of shared memory with mod_perl - I just add a 'use Foo' to my
startup.pl!  It can be hard for newbies to understand, but there's nothing
difficult about implementing it.  I often get 50% or more of my
application shared in this way.  That's a huge savings.

>  I don't assume that each approach is equally fast under all loads.  They
>  were about the same with concurrency level-1, but higher concurrency levels
>  they weren't.

Well, certainly not when mod_perl started swapping...

Actually, there is a reason why MRU could lead to better performance (as
opposed to just saving memory): caching of allocated memory.  The first
time Perl sees lexicals it has to allocate memory for them, so if you
re-use the same interpreter you get to skip this step and that should give
some kind of performance benefit.

>  I am saying that since SpeedyCGI uses MRU to allocate requests to perl
>  interpreters, it winds up using a lot fewer interpreters to handle the
>  same number of requests.

What I was saying is that it doesn't make sense for one to need fewer
interpreters than the other to handle the same concurrency.  If you have
10 requests at the same time, you need 10 interpreters.  There's no way
speedycgi can do it with fewer, unless it actually makes some of them
wait.  That could be happening, due to the fork-on-demand model, although
your warmup round (priming the pump) should take care of that.

>  I don't think it's pre-forking.  When I ran my tests I would always run
>  them twice, and take the results from the second run.  The first run
>  was just to "prime the pump".

That seems like it should do it, but I still think you could only have
more processes handling the same concurrency on mod_perl if some of the
mod_perl processes are idle or some of the speedycgi requests are waiting.

>  > This is probably all a moot point on a server with a properly set
>  > MaxClients and Apache::SizeLimit that will not go into swap.
> 
>  Please let me know what you think I should change.  So far my
>  benchmarks only show one trend, but if you can tell me specifically
>  what I'm doing wrong (and it's something reasonable), I'll try it.

Try setting MinSpareServers as low as possible and setting MaxClients to a
value that will prevent swapping.  Then set ab for a concurrency equal to
your MaxClients setting.

>  I believe that with speedycgi you don't have to lower the MaxClients
>  setting, because it's able to handle a larger number of clients, at
>  least in this test.

Maybe what you're seeing is an ability to handle a larger number of
requests (as opposed to clients) because of the performance benefit I
mentioned above.  I don't know how hard ab tries to make sure you really
have n simultaneous clients at any given time.

>  In other words, if with mod_perl you had to turn
>  away requests, but with mod_speedycgi you did not, that would just
>  prove that speedycgi is more scalable.

Are the speedycgi+Apache processes smaller than the mod_perl
processes?  If not, the maximum number of concurrent requests you can
handle on a given box is going to be the same.

>  Maybe.  There must a benchmark somewhere that would show off of
>  mod_perl's advantages in shared memory.  Maybe a 100,000 line perl
>  program or something like that - it would have to be something where
>  mod_perl is using *lots* of shared memory, because keep in mind that
>  there are still going to be a whole lot fewer SpeedyCGI processes than
>  there are mod_perl processes, so you would really have to go overboard
>  in the shared-memory department.

Well, I get tons of use out of shared memory without even trying.  If you
can find a way to implement it in speedycgi, I think it would be very
beneficial to your users.

>  I would rather just see mod_perl fixed if that's
>  possible.

Because this has more to do with the OS than Apache and is already fixed
in mod_perl 2, I doubt anyone will feel like messing with it before that
gets released.  Your experiment demonstrates that the MRU approach has
value, so I'll be looking forward to trying it out with mod_perl 2.

- Perrin

Reply via email to