Hi Bertrand, Thanks for your insightful email!
Just so you know, this server is an: AMD K6-2 500Mhz, 128M-133Mhz, 2 UDMA100 drives (IBM), 10M bandwidth. The server runs Apache, Qmail, vpopmail (for pop3). The webserver is not primary (doesn't have to have the fastest response time), as this is mainly for the mailing lists. The 2 hard disks are on 2 different IDE channels, as putting both disks on the same cable would drastically reduce performance of both disks. The way it is organized is that the mail spool/queue is on the 2nd disk, while the OS and programs are on disk 1. Logging is also performed on disk 1, so that writing to the mail log won't interfere with the mail queue (as they commonly both occur simultaneously). >From MY understanding, the "load average" shows how many programs are running, and not really how "stressed" the CPU is. I'm not sure exactly sure how this works (please correct me if i'm wrong) but 1 program taking 80% CPU might have load average of 2, while 100 programs taking 0.5% each would take 50% CPU and have load average of 8. Is that correct thinking? The reason I'm saying that is because qmail spawns a program called "qmail-remote" for EACH email to be sent. So if 200 concurrent emails are being sent at one time, then 200 qmail-remotes are created. Granted... each of these takes up tiny amounts of ram and CPU time, but I suspect (if the above statements are correct) that the load average would be artificially inflated because of it. We don't use NFS on this server. NFS on linux, as you said, is pretty crummy and should be avoided if possible. We simply put the mail queue on a seperate hard disk. pop3 load is extremely minimal. Its mainly an outgoing mail server (mail list server). People essentially use the website to send mail to the mailing list, so load of the pop3 server won't be an issue. About the HK job market, do you have any official qualifications (eg. a degree, diploma, cert., etc.)? In HK bosses like to see that... even more than in some other countries like Australia (not sure bout US). BTW. was your mother headmistress of St. Paul before? Sincerely, Jason ----- Original Message ----- From: "schemerz" <[EMAIL PROTECTED]> To: "Jason Lim" <[EMAIL PROTECTED]> Sent: Wednesday, June 06, 2001 3:57 PM Subject: Re: Finding the Bottleneck On Wed, Jun 06, 2001 at 11:53:22AM +0800, Jason Lim wrote: > Hi all, > > I was wondering if there is a way to find out what/where the bottleneck of > a large mail server is. > > A client is running a huge mail server that we set up for them (running > qmail), but performance seems to be limited somewhere. Qmail has already > been optimized as far as it can go (big-todo patch, large concurrency > patch, etc.). > > We're thinking one of the Hard Disks may be the main bottleneck (the mail > queue is already on a seperate disk on a seperate IDE channel from other > disks). Is there any way to find out how "utilized" the IDE channel/hard > disk is, or how hard it is going? Seems that right now the only way we > really know is by looking at the light on the server case (how technical > ;-) ). Must be a better way... > > The bottleneck wouldn't be bandwidth... it is definately with the server. > Perhaps the CPU or kernel is the bottleneck (load average: 4.84, 3.94, > 3.88, going up to 5 or 6 during heavy mailing)? Is that normal for a large > mail server? We haven't run such a large mail server before (anywhere > between 500k to 1M per day so far, increasing each day), so ANY tips and > pointers would be greatly appreciated. We've already been playing around > with hdparm to see if we can tweak the disks, but doesn't seem to help > much. Maybe some cache settings we can fiddle with? Maybe the mail queue > disk could use a different file cache setting (each email being from 1K to > 10K on average)? > > Thanks in advance! > > Sincerely, > Jason > Jason, I am a lurker on the list. I don't run linux anymore but recently a friend of mine encountered a similar load problem. Granted, he was running sendmail, but his main bottleneck wasn't the mta at all. I will explain his situation and see if you will find any similarites with yours... The discussion pertains to the mailserver at sunflower.com. When my friend took over the previous admin left the place in a mess. One of the machines he inherited was a dual 400 mhz penta 2 with about 256 megs of ram. It would be quite adequate for serving about 5000 accounts except... 1) The server was running on a box using the promise IDE raid controllers. IDE for a small server would work fine, but this box was getting hit alot. 2) This server was also the pop server. One of the reasons of design to run pop and sendmail on the same box was because nfs-exporting mail spools on linux was unsafe (nfs still sucks and WILL suck for ad infinitum, BUT linux was and still is probably the worst platform to employ nfs in a production system). So nfs exporting the mail spools to a seperate pop3 boxes were out of the question. Another local ISP, grapevine.net tried to do this with large sun boxes to no avail... Because this is a cable modem ISP, people configured their pop3 clients to pop their mail every 10 minutes or so... that times 5000... well you get my point. 3) Sendmail was configured out of the box from redhat, which works well for a 700 person institution. 5000 ? No... So when my friend got the box, he was looking at loads of 5-6 on any given day. Now, your choice to go with qmail is indicative of a few things. First of all you are probably nfs-exporting your maildirs, which is safe to do instead of plain spool flat files. second of all you aren't forking one large binary with sendmail, you are forking several smaller ones with qmail, which is far less load with the virtual manager (you can configure sendmail to use qmail's local delivery agent, i.e. sendmail writing to maildirs, but you aren't using sendmail so that can't be the problem). You are probably exporting your maildirs to another box to do pop/imap from, so those servers can't be the problem... (if the servers are running on the same box as the mta, check in top to see if that is the problem. qpopper may be notorious for security flaws, but it is internally multithreaded, so loads drop like a rock. There is nothing an openwall patch can't fix temporaily... check www.openwall.com.) Lastly, you might want to use scsi disks instead. Granted udma 100 is fast for burst rates, but it is still extremely cpu intensive. My friend went with ultra 2 disks and on tests using postal (another mail benchmarking util) performance soared. I don't know what else to think of, but those are my shots in the dark. On another topic, I am from Hong Kong and am currently investigating the job market back home to see how things are. I reside in the US until I get my permanent residence, of which I will return to Hong Kong to work if I can. My expertise is mostly BSD or Linux (I am flexible with BSD flavors, not so much with linux distros, but hey...) I'm also a fluent cantonese speaker and my english isn't shabby. I have references here and job experience (give or take 2 years). What should I shoot for back home if that option were to be made available to me ? yours truly, Bertrand Kotewall ([EMAIL PROTECTED] please mail me here, I don't have a static IP quite yet)

