Hi, I've just emerged from about 5 months of fairly continuous late
night development, what started as a few Berkeley db files and a tiny
cgi perl script has now grown to (for me) a monster size, kept largely
at bay by modperl..

I hope this story will be interesting to those who are taking the first
steps, and forgive any idiocy or simple wrong-headedness, because
i was too dumb to RTFM (or rather, RTFMLArchive), so I learned some
things the hard way, and am still learning.

The site is a not a commercial one, which is why I am free to
ramble on about it.. I am sure there are far more impressive modperl
sites now under NDAs or corporate lawyer mufflers, but without those
to read, you get mine ;)

As it stands, the site is entirely dynamically generated, and visited
by 4 to 10 thousand people a day. I wish that I could say that I read
the modperl FAQ and wrote it to handle that load from day one, but
that was not the case:

My understanding of why modperl was cool came tangentially from
Philip Greenspun on photo.net rather single-mindedly promoting, albeit
with persuasion, the AOLserver... the story that made most impression
was the bill gates personal wealth clock.. getting into netscape home page
for a day he had to cope with 2 hits per second that needed code run.
Thereby proving the point neatly that a fork/exec/cgi isn't going to cut
it. At the time, 2 hits a second seemed pretty impressive.. my god, thats
two people per SECOND.. i mean, oh what, 7600 per hour? 76,000 during
daylight ? well anyway, I digress.

Using AOLserver wasn't an option for me, personally because I am too tired
to get back into TCL and stub my toe 10 times a week once the programs
got so big that side-effects ruled.. I already thought Apache was the
swiss army knife of webservers, perl5 and cpan was already my home, and
the fact that slashdot used it (and modperl, as I slowly discovered),
just cemented the choice.

But just when you've climbed a peak you discover whole new and steeper
slopes ahead...

First quandry

I had no idea what traffic for my website was likely to be, and even 
less idea what it that translated to in terms of bandwidth.. these things
they don't teach you in cnet! I wince to remember at one point,
misunderstanding a Malda comment about an ISDN line to their house,
I even asked in wonderment if they hosted slashdot on 128kbps!
Nevertheless, the relationship between visitors, pages, and bandwidth
is not well understood because asking at some high powered places got
me no good answers, including at Philips site as well.

Well, I figured, bandwidth is at least something you can get more of if
you need it and want to pay, so I decided to watch it, and let that
one answer itself. (more on that later).

The first attempts.

Initially I setup squid in accelerator mode, and looked at the result,
but during development, hits were few, and bugs were many, so dealing
with an extra layer of config files and logs was too much so I put squid 
behind 'in case of emergency break glass', and switched back to
straight apache/modperl.

Due to the excellent online Apache modperl book, which I think back in
June was still being put together, and particularly the performance
tuning guide, I discovered how to setup a startup.pl that gave nice
backtraces for errors (the Carp module and code fragment), plus I managed
to navigate the global variable problem, and always used strict..
to this day though, I find having a small set of global variables 
that needs to be shared amongst some modules to be somewhat of a pain..
my current solution of putting them all in a hash array stored in
another module is ok, but kind of annoys me for some reason.

Newbie mistakes

Watching my machine choke when a syntax error caused apache to pile up
until swap was exhausted was not fun, especially on a live server, so I
became paranoid about syntax checking .. a vimrc from dejanews
made a great perl IDE, allowing me to '!make' and get taken to each
error, as I put them in, rather than in a batch at the end from the
command line, this hack I think, was my personal favourite of 1999
(so sue me, a vi bigot.. I cant/wont sit in emacs all day).

DB files are no way to run a war

Starting with DB files was probably the silliest decision I made. They
work fine in testing, but when you pressure them, (ie, "go live"), they
start to fall apart. Elaborate locking unlocking code fragments exist
but I don't trust them, and have the corrupted DBs to prove it.

MySQL goes with modperl like sushi and rice

I cant say enough good things about MySQL. Of literally all the lego
I've used in this web site, this has been rock solid... Despite doing
at times ridiculously silly things with tables rows and fields, it
has never ONCE crashed or corrupted the data. At my place of employ
you cannot say that about any commercial database they use, and they
are actually under less stress, plus have DBAs running around keeping
them on course... Oracle may not corrupt, but I'm damn sure it has been
known to lock up, or crash.. 
Anyway, use MySQL and modperl, write some helper routines to get round
the prepare/execute/finish thing, and you've got practically infinite
persistent shared memory, and from the top of your modperl server code,
you've got open and ready DB handles, near zero latency to data.
end of that story.

For want of a byte, the war was lost.

Memory and modperl has taken me the most time to understand to a level
where I am comfortable with scaling issues.. At first site, it doesnt
seem so bad.. so, your startup.pl enhanced modperl httpd process
(httpd heavy) is about 8mb instead of 600k, but I had half a gig,
and few hits.. however, the seeds of doom are already planted.. 

Memory: the obvious mistakes

Sooner or later, when writing a load of dynamic site code, you come to
a point where you want to run some kind of thing that depends on outside
resources. Maybe it queries other servers, FTPs a file, or makes dinner,
whatever.. who cares.. the point is, it takes from 0 to infinite TIME
to do this.. the user may get bored and press stop, or they may not,
but your HTTPD is sitting on 10+mb of prime real estate and it is hung
on a connect, or a lock or a wait or a sleep.. this is where you get into
trouble fast.. Apache will spawn extra processes, and one in ten of those
could get hung up also. If Bill Gates Personal Wealth clock has to call
Bill to find out how much he is worth, and Bill is sometimes too busy
to call back, some of those two hits a second are going steal 10mb from
your main memory and not give it back... as memory gets paged out, your
machine starts to choke on paging activity, users start to hit refresh,
and it just gets plain ugly. The solution? don't get in the way of an
modperl httpd serving a request. Do what the phone company does.. if you
have to do some length work, implement a queue, a queue processor, and
some hold music, that way, you process as fast as you can process and
people who are in a hurry leave and don't slow down those who are prepared
to wait.

Memory: more mistakes

For various reasons, my main site program was available at two different
URLs, as symlinks on disk to the same file... this was a mistake that
I only just found (duh!).. modperl will gobble up any unique URL into
memory that it sees -- so you end up with TWO identical compiled scripts
eating memory.. visible in the 'perl-status' apache URL which shows you the
(munged) names of the modules that have been digested to date..

Memory: yet more mistakes

After a while, each httpd got bigger. Initially it jumped by a lot, then
would rise slowly over time. This bugged the hell out of me, and I tried
a couple of ways to set the soft limits on memory use, until I discovered
the MaxChildRequests could be set to say, 100, and that solved that
problem for a while.. people often put slowly growing VM space down
to perl code that is written "wrongly", but I never saw a good
explanation of how you can write perl code to have a memory leak in it..
oh: here is a good tip for memory monitoring: at the beginning of
the cgi, read /proc/$$/status and remember the \d+ in front of VmData,
then do the same at the end. If the difference is more than Nk, add
a line to /tmp/$$.hog that prints the URL, size before and size after..
then you can come back in a day and chase down the code that needs to
allocate such large chunks of memory, whether or not it frees them.
MySQL also has a cool "select ... limit FROM, NUMBER" which is great for
doing paginated stuff.

Memory: still more mistakes

Unlike a lightweight httpd, httpd heavy does not benefit much by Apache
varying the number of servers .. if you own the whole box, it is
better to simply decide how much memory you are going to allocate to
apache, then pick Start, Min and Max to be at or close to that number
of httpds .. despite the dire warnings in apache docs if MaxClients 
is set too low, you do not want Apache spawning 10 more httpds when
20 are already busy and resident memory is full. This just makes the
machine page itself so much the 20 serving requests will run slower still
and longer delays .. more people coming .. more httpds get spawned ..
the vicious circle can only end with the server wallowing between
appearing dead, and briefly active again, giving your site a
very sick look to everyone, rather than maintaining requests per minute
at or near the max, and simply being randomly slower to some people.

Memory: and yet more mistakes

don't use KeepAlive for modperl heavy .. do the docs say that? I think
it defaults to off, and thank goodness.. keepalive means you must have
more memory than sense... each requested httpd sits around for some
predefined time (15 seconds, 30, a minute whatever), waiting in case
the same user wants another page. Whilst it is doing that, it ain't
free.. its marked "K" in server-status, and Apache needs more "_" so
it spawns em. at 10 httpds per 128mb memory board, you're going to run
out of bucks before your users finish with you.
Whilst we are on this subject of hanging around, look closely at your
timeout values for send and receive.. the defaults are a little, er,
generous.. if you look at server-status a lot, you may start to notice
some "W"s that don't move.. they are waiting around for the timeouts
to expire or requests that will never finish.. unless you know your
users are on very slow links, why wait 300 seconds for a request to
finish?

Code size and memory

Memory wasn't such a huge problem when the code was small, but I kept
adding and adding, and often not disabling old sections.. now the program
is up to 25000 lines of perl, 75% in one file and 25% in modules.
This eats serious memory! What starts out as about a 10mb httpd, jumps
straight to 18mb upon first execution! _that_ is why most of this article
is about memory. It isn't totally clear to me why perl needs 8+mb of
memory to "compile" 25000 lines of perl that take up something like 500k
of text.. the size of the mother ship httpd with just a simple startup.pl
that only pulls in DBD/DBI and a few small modules, also implies that
another 20,000 lines of code lurks in there. But perl needs what it
needs.. you have to live with that.
It also isn't clear to me how to choose certain modules are NOT to be
kept precompiled and hogging memory. I might want to write long complex
but infrequently used modules.. how does one say that this use is
transient and should not be cached? I should probably RTFM again.

IMAGES are important

important to make your website look good, but also vital in memory
management and modperl.. for 99.9% of modperl websites, 1 in 5 hits
is dynamic, the rest are just static! I cant repeat this enough..
even 100% dynamic websites will need a collection of gifs and they
will account for the majority of hits, if not traffic. Don't forget,
many browsers constantly send "fetch this image if its newer than
time XX" GETs, and return empty handed, but that is a hit, and whilst
that hit is getting processed, Mr httpd modperl heavy 10-20mb
is wasting your memory doing something trivial.
Browsers also like to send out groups of 4 requests at once, and I am
sure those with faster connections are going to soon discover they can
tweak the browser to ignore the RFCs and send out 8 or 16 requests at
once for images... so thats 4+ httpd heavies camped out over 80mb of
resident memory, for a second or more... just for one blinking user!
squid in accelerator mode is one solution, since it will snarf the image
and cache it (or not) but the important point is the httpd job is
over in a blink, rather than dribbling it out over a slow modem link
to your site user.. I still have resisted the squid layer (dumb
stubbornness I think), and instead got myself another IP address on the
same interface card, bound the smallest most light weight separate
apache to it that I could make, and prefixed all image requests with 
http://1.2.3.4/.. voila. that was the single biggest jump in throughput
that I discovered.

httpd light and images

In this case, turn keepalive ON if you are not really tight with memory..
it really helps your users who can reuse the open pipe to httpd light
to get 4 images in a row rather than open/close open/close.., if you
are used to struggling with 20 httpds taking 20mb a piece, having 
40 httpd lites ready and waiting, taking 500k a piece is no sweat.

Regroup

All the above decisions were made slowly over 3 months, usually after 
noticing load average on the machine was climbing due to paging..
In retrospect, this was a silly way to manage things, since if the
site got a significant mention (which was what I was always after),
then significant extra loads could have sunk me.. but, through dumb
luck, that (mostly) didnt happen.

Problems come from the oddest places

What relationship has 10% packet loss on one upstream provider got to
do with machine memory ? anyone? anyone?
Yes.. a lot. For a nightmare week, the box was located downstream of a 
provider who was struggling with some serious bandwidth problems of his
own... people were connecting to the site via this link, and packet loss
was such that retransmits and tcp stalls were keeping httpd heavies
around for much longer than normal.. instead of blasting out the data
at high or even modem speeds, they would be stuck at 1k/sec or stalled
out...  people would press stop and refresh, httpds would take 300
seconds to timeout on writes to no-one.. it was a nightmare. 
Those problems didnt go away till I moved the box to a place closer
to some decent backbones.

So what is the summary?

Modperl is awesome.. write a little modperl module that prints 10k of text
and exits.. then run apache bench (ab). Unless you have made a mistake,
you will realize that a properly setup modperl program is capable of hits
per second far beyond what you can ever afford in bandwidth... Apache and
my $2000 homebrew pentium III can spew out dumb modperl text pages at a
rate of 500-1000 hits a second.. pure perl and apache speed is the absolute
last of your problems..

When the site first went up, I was getting around 2000 dynamic page hits
a day, possibly 10000 hits if you include the gif files.. that was at
peak, about 1 dynamic hit every 5-10 seconds.. and (although I did not
realize it) this would not scale, due to all the mistakes mentioned here.
Numbers rose to 1 dynamic hit every four seconds, then every 2 seconds,
then every second, and now at peak, it has run at 4 hits per second, and
averages at 2 hits per second, plus another 10-20 static image hits a second
directed to the 2nd ip address and an httpd light.

Interestingly (to me), as hits per second increased, I had to revisit
Max requests per child. EG: For 100 hits per child, apache spawns
and re-interprets your large perl module literally twice a minute! so
twice a minute, someone waits 2 seconds extra for a request..

3 hits/sec are handled by 20 (fixed max) mod perl heavies, each taking
20mb of resident memory. 30 mysql daemons to service them, 31-50 httpd
lites. A good 30% of these dynamic hits are long and complex, involving
dozens of SQL queries of different kinds that make up the final page,
and the rest vary in complexity. Every dynamic hit has to re-authenticate
a user cookie against a Mysql cookie database.

The other thing I was pleased to do was setup MRTG (actually, now cricket)
to keep an eye on bandwidth so i had evidence..
   http://209.123.109.175/stats/mrtg/eth0/eth0.html
should give you an idea at the traffic patterns from site inception
until now.. as each of those new peaks arrived one of the lessons
above was learned..

During an hours or so of 4 hits per second (thanks to a mention on zdnet
anchor desk), the load average on the box (A Dell poweredge 1300 with two
pentium II 400mhz CPUs) was 1.4, and bandwidth used was most of a T1 ..
At a more sedate 2 hits per second, and load average is below 1.0.
maybe a T1 is the max it will ever use, but if there needs to be more
scaling to be done, then there is always slashdot, proving you can
soak up "most of a T3", and it runs on a dual Xeon, splits off the images
(and possibly the mysql database?) to another machine, and caches as
much as it can.

Anyway, sorry this got boring.. just wanted to say that the Linux kernel,
Apache, MySQL and modperl and hundreds of supporting programs, all written
by enthusiasts and available for free, with sites such as imdb and slashdot,
proves capable of blowing the doors off any expensive commercial
product.. it is as though 1000s of hobbyists got together, and built a
car that was faster, more capable and more reliable than a factory prepared
porsche at Le Mans! (my site is more in the go-kart league, but not because
modperl hasnt the horsepower, but because the architect/driver is still
on training wheels).

comments welcome.

-Justin

Reply via email to