Hello, courier-world! :-)

We have been using Courier on our production machine for about 6 months. From the start, there was never any kind of problem - it's a great software! :)

But when we changed the FreeBSD version from 4.11 to 5.4 (5.4-RELEASE-p8), there was a *big* performance degradation in the authentication process and the load soared.

The symptons are:
 - authentication process takes about 15s~20s to complete
   (by using pop3 client, telnet to pop3 port, or even with
   the "authtest" util)
 - authdaemond process are the biggest CPU eaters (according to TOP)
 - machine load is higher (almost the double than before)

We don't know which is cause and which is consequence... :-/

We tried to open more authdaemons instances, but after 8 daemons the problems just get worse.

All Courier (and related) softwares/packages were recompiled, and all was the same.

An upgrade to FreeBSD 6.0 also didn't help.

I made a "stress test" running 20 authtest in parallel, 3.000 time in total, on two different machines:

- on a machine (single processor) with FreeBSD 4.11, it takes about
  40s to complete (75 reqs./min)
- on the production machine (dual processors) with FreeBSD 6.0, it takes
  2m50s (17.5 reqs./min)

At this point, I should stress that we were using Courier for 6 months without any problem, and the only change we made was to upgrade the OS. You might say "just keep using the 4.11 version", but it'll soon get obsolete and won't get new features, and eventually not even security fixes.

Any ideas of what is the cause? And most important, the solution?


TRUSS
=====

A funny thing is that if I check the authdaemon with truss on FreeBSD 4.11 and that daemon is processing just one request, I get this:

---- 8< ------------------------------------------------------
(null)()                          = 1 (0x1)
accept(0x5,0xbfbffae0,0xbfbffa4c) ERR#35 'Resource temporarily
                                  unavailable'
---- 8< ------------------------------------------------------


But if it's busy, I see this:

---- 8< ------------------------------------------------------
select(0x6,0xbfbffa60,0x0,0x0,0xbfbffa50)        = 1 (0x1)
accept(0x5,{ AF_UNIX "" },0xbfbffa4c)            = 4 (0x4)
fcntl(0x4,0x4,0x0)                               = 0 (0x0)
select(0x5,0xbfbff560,0x0,0x0,0xbfbff558)        = 1 (0x1)
read(0x4,0x804c820,0x400)                        = 19 (0x13)
write(2,0xbfbfea48,47)                           = 47 (0x2f)
write(2,0xbfbfea48,35)                           = 35 (0x23)
geteuid()                                        = 0 (0x0)
stat("/etc/spwd.db",0xbfbff420)                  = 0 (0x0)
open("/etc/spwd.db",0x0,00)                      = 6 (0x6)
fcntl(0x6,0x2,0x1)                               = 0 (0x0)
read(0x6,0x804f000,0x104)                        = 260 (0x104)
lseek(6,0x20f000,0)                              = 2158592 (0x20f000)
read(0x6,0x8053000,0x1000)                       = 4096 (0x1000)
close(6)                                         = 0 (0x0)
write(2,0xbfbfedd8,175)                          = 175 (0xaf)
write(2,0xbfbfed58,174)                          = 174 (0xae)
select(0x5,0x0,0xbfbff420,0x0,0xbfbff418)        = 1 (0x1)
write(4,0x804c820,115)                           = 115 (0x73)
close(4)                                         = 0 (0x0)
---- 8< ------------------------------------------------------

If I do the same with FreeBSD 5 or 6, I see it reading the /etc/spwd.db file several thousands times! :-(

I think that's the problem, but I just don't know who to blame, and how
--
... Indifference will certainly be the downfall of mankind, but who cares?to fix it...



FACT SHEET
==========

- We have about 400 logins/minute at peak times (during business hours)
- FreeBSD 4.11 was OK! (very fast authentication, low load)
- FreeBSD 5.4-RELEASE-p8 is *SLOW* (each authentication = 15s~20s)
- FreeBSD 6.0-RELEASE-p2 is *SLOW* (each authentication = 15s~20s)
- Using "ports" versions:
   courier-authlib-0.58_1
   courier-authlib-base-0.58_1
   courier-imap-4.0.6_1,1
- Machine specs:
   Dual Intel(R) Xeon(TM) CPU 2.40GHz
   1GB RAM
   Raid
- /usr/local/etc/authlib/authdaemonrc:

##NAME: authmodulelist:2
authmodulelist="authpam"

##NAME: authmodulelistorig:3
authmodulelistorig="authpam"

##NAME: daemons:0
daemons=8

##NAME: authdaemonvar:2
authdaemonvar=/var/run/authdaemond

##NAME: subsystem:0
subsystem=mail

- log messages:

... authdaemond: modules="authpam", daemons=8
... authdaemond: Installing libauthpam
... authdaemond: Installation complete: authpam

- debug messages:

... pop3d: Connection, ip=[xxx]
... authdaemond: received auth request, service=pop3, authtype=login
... authdaemond: authpam: trying this module
... authdaemond: authpam: sysusername=xxx, sysuserid=<null>,
    sysgroupid=350, homedir=/var/Users/xxx, address=xxx,
    fullname=xxx, maildir=<null>, quota=<null>, options=<null>
... authdaemond: pam_service=pop3, pam_username=xxx
... authdaemond: dopam successful

Obs.: I removed the timestamps and other info for clarification; all logs appears at once in the log files: even if the process takes 20s to complete, the timestamps on the lines differ in only one second!


[]s!


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Courier-imap mailing list
[email protected]
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-imap

Reply via email to