We recently converted our main mail server (30,000+ users) from cyrus-1.6 to
cyrus-2.0.12, we had converted a smaller  (6000+ users) some time earlier to
2.0.9.  We had  tried 2.0.9 on this larger server, but that version has severe
performance problems with that many mailboxes.

Things looked pretty good initially, but after a few days, it stopped responding
to POP and IMAP requests.   A lsof and a PS showed hundreds of lmtpd processes
and increasing.  About that time we could get no response at all from the
machine and were forced to reboot before we could gather more information.

This has happened 4 more times since at intervals of from 1 to 4 days (always
during off hours although that may not be significant).  One of these times
I was able to get in and send a TERM signal to the master process and all shut
down fine and things worked fine when I restarted the master process.  From this
it appears that when a process is aborted in this fashion, some resource is
remaining locked causing all new processes (lmtpd, imapd and pop) to hang.

On examining the logs, I found that each of these incidents was immediately
preceded by the message:

"signaled to death by 6"

4 times the process in question was imapd, once it was lmtpd.

There was no core file produced, I've since changed the startup script to cd
into a directory writeable by cyrus and removed the "ulimit -c 0" from the
startup script, but I've not yet gotten a core file to look at.

In the meantime, I'm posting this to the list on the off chance someone else has
seen and debugged this problem.

The mail server is a dual Pentium III 500 with 1GB ram, 100GB hardware raid
running RedHat 7.0 with all current updates applied except the kernel which is
kernel-smp-2.2.16-22

--
Irelann Kerry Anderson          phone:    (207)581-3508
Systems Group                   internet  [EMAIL PROTECTED]
UNET (formerly CAPS) Technology Services
University of Maine System



Reply via email to