Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withsc ripts that contain un-shared memory

Sam Horrocks Thu, 18 Jan 2001 20:25:13 -0800
 > This doesn't affect the argument, because the core of it is that:
 > 
 > a) the CPU will not completely process a single task all at once; instead,
 > it will divide its time _between_ the tasks
 > b) tasks do not arrive at regular intervals
 > c) tasks take varying amounts of time to complete
 > 
 > Now, if (a) were true but (b) and (c) were not, then, yes, it would have the
 > same effective result as sequential processing. Tasks that arrived first
 > would finish first. In the real world however, (b) and (c) are usually true,
 > and it becomes practically impossible to predict which task handler (in this
 > case, a mod_perl process) will complete first.

 I'll agree with (b) and (c) - I ignored them to keep my analogy as simple
 as possible.  Again, the goal of my analogy was to show that a stream of
 10 concurrent requests can be handled with the same througput with a lot
 fewer than 10 perl interpreters.  (b) and (c) don't really have an effect
 on that - they don't control the order in which processes arrive and get
 queued up for the CPU.

 I won't agree with (a) unless you qualify it further - what do you claim
 is the method or policy for (a)?

 There's only one run queue in the kernel.  THe first task ready to run is put
 at the head of that queue, and anything arriving afterwards waits.  Only
 if that first task blocks on a resource or takes a very long time, or
 a higher priority process becomes able to run due to an interrupt is that
 process taken out of the queue.

 It is inefficient for the unix kernel to be constantly switching
 very quickly from process to process, because it takes time to do
 context switches.  Also, unless the processes share the same memory,
 some amount of the processor cache can get flushed when you switch
 processes because you're changing to a different set of memory pages.
 That's why it's best for overall throughput if the kernel keeps a single
 process running as long as it can.

 > Similarly, because of the non-deterministic nature of computer systems,
 > Apache doesn't service requests on an LRU basis; you're comparing SpeedyCGI
 > against a straw man. Apache's servicing algortihm approaches randomness, so
 > you need to build a comparison between forced-MRU and random choice.

 Apache httpd's are scheduled on an LRU basis.  This was discussed early
 in this thread.  Apache uses a file-lock for its mutex around the accept
 call, and file-locking is implemented in the kernel using a round-robin
 (fair) selection in order to prevent starvation.  This results in
 incoming requests being assigned to httpd's in an LRU fashion.

 Once the httpd's get into the kernel's run queue, they finish in the
 same order they were put there, unless they block on a resource, get
 timesliced or are pre-empted by a higher priority process.

 > Thinking about it, assuming you are, at some time, servicing requests
 > _below_ system capacity, SpeedyCGI will always win in memory usage, and
 > probably have an edge in handling response time. My concern would be, does
 > it offer _enough_ of an edge? Especially bearing in mind, if I understand,
 > you could end runing anywhere up 2x as many processes (n Apache handlers + n
 > script handlers)?

 Try it and see.  I'm sure you'll run more processes with speedycgi, but
 you'll probably run a whole lot fewer perl interpreters and need less ram.
 
 Remember that the httpd's in the speedycgi case will have very little
 un-shared memory, because they don't have perl interpreters in them.
 So the processes are fairly indistinguishable, and the LRU isn't as 
 big a penalty in that case.

 This is why the original designers of Apache thought it was safe to
 create so many httpd's.  If they all have the same (shared) memory,
 then creating a lot of them does not have much of a penalty.  mod_perl
 applications throw a big monkey wrench into this design when they add
 a lot of unshared memory to the httpd's.

 > > No, homogeneity (or the lack of it) wouldn't make a 
 > > difference.  Those 3rd,
 > > 5th or 6th processes run only *after* the 1st and 2nd have 
 > > finished using
 > > the CPU.  And at that poiint you could re-use those 
 > > interpreters that 1 and 2
 > > were using.
 > 
 > This, if you'll excuse me, is quite clearly wrong. See the above argument,
 > and imagine that tasks 1 and 2 happen to take three times as long to
 > complete than 3, and you should see that that they could all end being in
 > the scheduling queue together. Perhaps you're considering tasks which are
 > too small to take more than 1 or 2 timeslices, in which case, you're much
 > less likely to want to accelerate them.

 So far to keep things fairly simple I've assumed you take less than one
 time slice to run.  A timeslice is fairly long on a linux pc (210ms).

 But say they take two slices, and interpreters 1 and 2 get pre-empted and
 go back into the queue.  So then requests 5/6 in the queue have to use
 other interpreters, and you expand the number of interpreters in use.
 But still, you'll wind up using the smallest number of interpreters
 required for the given load and timeslice.  As soon as those 1st and
 2nd perl interpreters finish their run, they go back at the beginning
 of the queue, and the 7th/ 8th or later requests can then use them, etc.
 Now you have a pool of maybe four interpreters, all being used on an MRU
 basis.  But it won't expand beyond that set unless your load goes up or
 your program's CPU time requirements increase beyond another timeslice.
 MRU will ensure that whatever the number of interpreters in use, it
 is the lowest possible, given the load, the CPU-time required by the
 program and the size of the timeslice.
Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withsc ripts that contain un-shared memory

Reply via email to