Hello, courier-world! :-)
We have been using Courier on our production machine for about 6 months.
From the start, there was never any kind of problem - it's a great
software! :)
But when we changed the FreeBSD version from 4.11 to 5.4
(5.4-RELEASE-p8), there was a *big* performance degradation in the
authentication process and the load soared.
The symptons are:
- authentication process takes about 15s~20s to complete
(by using pop3 client, telnet to pop3 port, or even with
the "authtest" util)
- authdaemond process are the biggest CPU eaters (according to TOP)
- machine load is higher (almost the double than before)
We don't know which is cause and which is consequence... :-/
We tried to open more authdaemons instances, but after 8 daemons the
problems just get worse.
All Courier (and related) softwares/packages were recompiled, and all
was the same.
An upgrade to FreeBSD 6.0 also didn't help.
I made a "stress test" running 20 authtest in parallel, 3.000 time in
total, on two different machines:
- on a machine (single processor) with FreeBSD 4.11, it takes about
40s to complete (75 reqs./min)
- on the production machine (dual processors) with FreeBSD 6.0, it takes
2m50s (17.5 reqs./min)
At this point, I should stress that we were using Courier for 6 months
without any problem, and the only change we made was to upgrade the OS.
You might say "just keep using the 4.11 version", but it'll soon get
obsolete and won't get new features, and eventually not even security fixes.
Any ideas of what is the cause? And most important, the solution?
TRUSS
=====
A funny thing is that if I check the authdaemon with truss on FreeBSD
4.11 and that daemon is processing just one request, I get this:
---- 8< ------------------------------------------------------
(null)() = 1 (0x1)
accept(0x5,0xbfbffae0,0xbfbffa4c) ERR#35 'Resource temporarily
unavailable'
---- 8< ------------------------------------------------------
But if it's busy, I see this:
---- 8< ------------------------------------------------------
select(0x6,0xbfbffa60,0x0,0x0,0xbfbffa50) = 1 (0x1)
accept(0x5,{ AF_UNIX "" },0xbfbffa4c) = 4 (0x4)
fcntl(0x4,0x4,0x0) = 0 (0x0)
select(0x5,0xbfbff560,0x0,0x0,0xbfbff558) = 1 (0x1)
read(0x4,0x804c820,0x400) = 19 (0x13)
write(2,0xbfbfea48,47) = 47 (0x2f)
write(2,0xbfbfea48,35) = 35 (0x23)
geteuid() = 0 (0x0)
stat("/etc/spwd.db",0xbfbff420) = 0 (0x0)
open("/etc/spwd.db",0x0,00) = 6 (0x6)
fcntl(0x6,0x2,0x1) = 0 (0x0)
read(0x6,0x804f000,0x104) = 260 (0x104)
lseek(6,0x20f000,0) = 2158592 (0x20f000)
read(0x6,0x8053000,0x1000) = 4096 (0x1000)
close(6) = 0 (0x0)
write(2,0xbfbfedd8,175) = 175 (0xaf)
write(2,0xbfbfed58,174) = 174 (0xae)
select(0x5,0x0,0xbfbff420,0x0,0xbfbff418) = 1 (0x1)
write(4,0x804c820,115) = 115 (0x73)
close(4) = 0 (0x0)
---- 8< ------------------------------------------------------
If I do the same with FreeBSD 5 or 6, I see it reading the /etc/spwd.db
file several thousands times! :-(
I think that's the problem, but I just don't know who to blame, and how
--
... Indifference will certainly be the downfall of mankind, but who
cares?to fix it...
FACT SHEET
==========
- We have about 400 logins/minute at peak times (during business hours)
- FreeBSD 4.11 was OK! (very fast authentication, low load)
- FreeBSD 5.4-RELEASE-p8 is *SLOW* (each authentication = 15s~20s)
- FreeBSD 6.0-RELEASE-p2 is *SLOW* (each authentication = 15s~20s)
- Using "ports" versions:
courier-authlib-0.58_1
courier-authlib-base-0.58_1
courier-imap-4.0.6_1,1
- Machine specs:
Dual Intel(R) Xeon(TM) CPU 2.40GHz
1GB RAM
Raid
- /usr/local/etc/authlib/authdaemonrc:
##NAME: authmodulelist:2
authmodulelist="authpam"
##NAME: authmodulelistorig:3
authmodulelistorig="authpam"
##NAME: daemons:0
daemons=8
##NAME: authdaemonvar:2
authdaemonvar=/var/run/authdaemond
##NAME: subsystem:0
subsystem=mail
- log messages:
... authdaemond: modules="authpam", daemons=8
... authdaemond: Installing libauthpam
... authdaemond: Installation complete: authpam
- debug messages:
... pop3d: Connection, ip=[xxx]
... authdaemond: received auth request, service=pop3, authtype=login
... authdaemond: authpam: trying this module
... authdaemond: authpam: sysusername=xxx, sysuserid=<null>,
sysgroupid=350, homedir=/var/Users/xxx, address=xxx,
fullname=xxx, maildir=<null>, quota=<null>, options=<null>
... authdaemond: pam_service=pop3, pam_username=xxx
... authdaemond: dopam successful
Obs.: I removed the timestamps and other info for clarification; all
logs appears at once in the log files: even if the process takes 20s to
complete, the timestamps on the lines differ in only one second!
[]s!
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Courier-imap mailing list
[email protected]
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-imap