splitting the code sets across more than one mod_perl server

Stas Bekman Wed, 13 Dec 2000 07:09:03 -0800
I was going thru some math fixes in the guide that were spotted
surprisingly during my early morning class at YAPC::Europe, and while
re-reading the material I decided to point your attention that it might be
much more memory conserving wise not to pull all the code that you have
into the same mod_perl server, if you can identify at least two code sets
that big and share very little *between* them. 

Since I've just corrected this material, I'm posting it entirely, since
the online version has some minor glitches in math. Hope you find it
useful:

BTW, the formula that I use below is derived from:

               Total_RAM + Shared_RAM_per_Child * (MaxClients - 1)
  MaxClients = ---------------------------------------------------
                              Max_Process_Size

(which is still wrong on the online version, but it almost doesn't affect
the final results :)

=head1 Running More than One mod_perl Server on the Same Machine.

Let's assume that you have two different sets of code which have
little or nothing in common--different Perl modules, no code
sharing. Typical numbers can be four megabytes of unshared and four
megabytes of shared memory for each code set, plus three megabytes of
shared basic mod_perl stuff.  Which makes each process 17MB in size
when the two code sets are loaded. (3MB (server core shared) + 4MB
(shared 1st code set ) + 4MB (unshared 1st code set ) + 4MB (shared
2nd code set ) + 4MB (unshared 2nd code set ). Under this scenario:

   Shared_RAM_per_Child :  11MB
   Max_Process_Size     :  17MB
   Total_RAM            : 251MB

We assume that four megabytes is the size of each code sets unshared
memory. This is a pretty typical size of unshared memory, especially
when connecting to databases, as the database connections cannot be
shared. Databases like Oracle can take even more RAM per connection on
top of this.

Let's assume that we have 251 megabytes of RAM dedicated to the
webserver.

According to the equation developed in the section: "L<Choosing
MaxClients|performance/Choosing_MaxClients>":

                    Total_RAM - Shared_RAM_per_Child
  MaxClients = ---------------------------------------
               Max_Process_Size - Shared_RAM_per_Child


  MaxClients = (251 - 11)/(17-11) = 40

We see that we can run 40 processes, using the given memory and the
two code sets in the same server.

Now consider this practical decision. Since we have recognized that
the code sets are very distinct in nature and there is no significant
memory sharing in place, the wise thing to do is to split the two code
sets between two mod_perl servers (a single mod_perl server actually
is a set of the parent process and a number of the child
processes). So instead of running everything on one server, now we
move the second code set onto another mod_perl server. At this point
we are talking about a single machine.

Let's look at the figures again. After the split we will have 20
servers of eleven megabytes (4MB unshared + 7mb shared) and another 20
more of the same kind.

How much memory do we need now? From the above equation we derive:

  Total_RAM = MaxClients * (Max_Process_Size - Shared_RAM_per_Child)
              + Shared_RAM_per_Child

And using the numbers (the total of 40 servers):

  Total_RAM = 2 * (20*(11-7)+7) = 174

A total of 174MB of memory required. But, hey, we have 251MB of
memory. We've got 77MB of memory freed up. If we recalculate again the
C<MaxClients> we will see that we can run almost 60 servers:

  MaxClients = (251 - 7*2)/(11-7) = 59

So we can run about 19 more servers using the same memory size. Almost
30 servers for each code set instead of 20 originally. We have
enlarged the servers pool by half without changing the machine's
hardware.

Moreover this new setup allows us to fine tune the two code sets,
since in reality the smaller in size code base might have a higher hit
rate, so we can benefit even more.

Let's assume that based on the usage statistics we know that the first
code set is called in 70% of requests and the other 30% are used by
the second set. Now we assume that the first code set requires only
5MB of RAM (3MB shared plus 2MB unshared) over the basic mod_perl
server size, and the second set needs 11MBytes (7MB shared and 4MB
unshared).

Lets compare this new requirement with our original 50:50 setup (qhere
we have assigned the same number of clients for each code set).

So now the first mod_perl server running the first code set will have
all its processes using 8MB (3MB (server shared) + 3MB (code shared) +
2MB (code unshared), and the second 14MB (3+7+4).  Given that we have
a 70:30 hits relation and that we have 251MB of available memory, we
have to solve these two equations:

  X/Y = 7/3

  X*(8-6) + 6 + Y*(14-10) + 10 = 251

where X is the total number of the processes the first code set can
use and Y the second. The first equation reflect the 70:30 hits
relation, and the second uses the equation for the total memory
requirements for the given number of servers and the shared and
unshared memory sizes.

When we solve these equations, we find that X equals 63 and Y equals
27. So we have a total of 90 servers -- two and a half times the
number of servers running compared to the original setup using the
same memory size.

The hits rate optimized solution and the fact that the code sets can
be different in their memory requirements, allowed us to run 30 more
servers in total and gave us 33 more servers (63 versus 30) for the
most wanted code base, relative to the simple 50:50 split as in the
first example.

Of course if you identify more than two distinct sets of code based on
your hit rate statistics, more complicated solutions may be required.
You could make even more splits and run three or more mod_perl
servers.

Remember that having too many running processes doesn't necessarily
mean better performance because all of them will contend for CPU time
slices. The more processes that are running the less CPU time each
gets and the slower overall performance will be. Therefore after
hitting a certain load you might want to start spreading servers over
different machines.

In addition to the obvious memory saving you gain the power to
troubleshoot problems that occur more easily when you have different
components running on different servers. It's quite possible that a
small change in the server configuration to fix or improve something
for one code set, might completely break the second code set. For
example if you upgrade the first code set and it requires an update of
some modules that both code bases rely on. But there is a chance that
the second code set won't work with a new version of a module it was
relying on.





_____________________________________________________________________
Stas Bekman              JAm_pH     --   Just Another mod_perl Hacker
http://stason.org/       mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://apachetoday.com http://logilune.com/
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/
splitting the code sets across more than one mod_perl server

Reply via email to