Hi,
  I have an email system with more then 35000 accounts and that process arround 
380000 messages a day on 4 servers. For more then an year I am facing random 
hangs on the smtp server.

  At a random time  (may be hours, days or weeks) the main couriertcpd keeps 
running and accepting connections (until the max clients are reached) but the 
childs processess never ends.
  Today I get some usefull strace outputs that may help to solve the problem.
  The child process get locked on a infinity loop reading the smtpaccess.dat. 
All the child couriertcpd I strace are on the same loop.
  smtpaccess.dat was not modified.

  The problem occour in all servers. I already reinstall some them. Some run 
Debian 32bits, Some Debian 64bits. Some are fresh install, some are old 
install. But the problem happens in all them.

  I try to debug the source code to find where is the problema but it seems too 
complex for me. May be you (Sam?) can help me on what I can to to solve this 
problem?

Marcus

1) strace for a child couriertcpd process during normal operation
------------------------------------------------
17:46:05.482106 rt_sigaction(SIGCHLD, {SIG_DFL}, {0x404ae0, [CHLD], 
SA_RESTORER|SA_RESTART, 0x7f4144f8af60}, 8) = 0
17:46:05.482278 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
17:46:05.482683 brk(0x2503000) = 0x2503000
17:46:05.482974 brk(0x2524000)               = 0x2524000
17:46:05.483454 brk(0x2545000)               = 0x2545000
17:46:05.483729 lseek(4, 8192, SEEK_SET)               = 8192
17:46:05.484210 read(4, "\0\0\0\0\r\0\0\0\31\0\0\0\271\3627 
216.\2474\0\0\0\0\0\0\16\0\0\0\31"..., 4096) = 4096
17:46:05.484686 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(25), 
inet_pton(AF_INET6, "::ffff:74.86.76.42", &sin6_addr), sin6_flowinfo=0, 
sin6_scope_id=0}, [28]) = 0
17:46:05.485206 open("/etc/resolv.conf", O_RDONLY)               = 6
.
.
. normal process end..
---------------------------------------------


2) strace for a child couriertcpd process while on start of the lock
----------------------------------------------
17:46:43.742191 rt_sigaction(SIGCHLD, {SIG_DFL}, {0x404ae0, [CHLD], 
SA_RESTORER|SA_RESTART, 0x7f4144f8af60}, 8) = 0
17:46:43.743147 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
17:46:43.748307 brk(0x2503000) = 0x2503000
17:46:43.753416 brk(0x2524000)               = 0x2524000
17:46:43.753745 brk(0x2545000) = 0x2545000
17:46:43.754507 lseek(4, 8192, SEEK_SET) = 8192
17:46:43.754836 read(4, "201.17.129.83allow,SIZELIMIT=5242"..., 4096) = 1650
17:46:43.755069 read(4, ""..., 2446) = 0
17:46:43.756458 read(4, ""..., 2446) = 0
17:46:43.756556 read(4, ""..., 2446) = 0
17:46:43.756994 read(4, ""..., 2446) = 0
17:46:43.757080 read(4, ""..., 2446)   = 0
17:46:43.757188 read(4, ""..., 2446) = 0
17:46:43.757276 read(4, ""..., 2446)    = 0
17:46:43.757367 read(4, ""..., 2446)    = 0
17:46:43.757452 read(4, ""..., 2446)     = 0
17:46:43.757534 read(4, ""..., 2446) = 0
17:46:43.757617 read(4, ""..., 2446) = 0
17:46:43.757703 read(4, ""..., 2446)    = 0
17:46:43.757794 read(4, ""..., 2446)     = 0
17:46:43.757877 read(4, ""..., 2446) = 0
17:46:43.757960 read(4, ""..., 2446) = 0
17:46:43.758047 read(4, ""..., 2446) = 0
17:46:43.758155 read(4, ""..., 2446) = 0
17:46:43.758260 read(4, ""..., 2446) = 0
17:46:43.758570 read(4, ""..., 2446) = 0
17:46:43.758654 read(4, ""..., 2446) = 0
17:46:43.758762 read(4, 
"\1\0\0\0\0\0\0\0\216\t\0\0\0\0\0\0r6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2446) 
= 2446
17:46:43.759082 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(25), 
inet_pton(AF_INET6, "::ffff:74.86.76.42", &sin6_addr), sin6_flowinfo=0, 
sin6_scope_id=0}, [28]) = 0
17:46:43.759258 open("/etc/resolv.conf", O_RDONLY) = 6
.
.
.
. normal process end..
----------------------------------------------


3) strace for a child couriertcpd during the smtp hang
---------------------------------------------
17:46:49.944384 rt_sigaction(SIGCHLD, {SIG_DFL}, {0x404ae0, [CHLD], 
SA_RESTORER|SA_RESTART, 0x7f4144f8af60}, 8) = 0
17:46:49.950918 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
17:46:49.951420 brk(0x2503000)          = 0x2503000
17:48:05.794865 brk(0x2524000) = 0x2524000
17:48:05.795256 brk(0x2545000) = 0x2545000
17:48:45.322594 lseek(4, 8192, SEEK_SET)               = 8192
17:48:45.322770 read(4, "201.17.129.83allow,SIZELIMIT=5242"..., 4096) = 1650
17:48:45.322967 read(4, ""..., 2446) = 0
17:48:45.323220 read(4, ""..., 2446)   = 0
17:48:45.323390 read(4, ""..., 2446) = 0
17:48:45.323529 read(4, ""..., 2446)    = 0
17:48:45.323675 read(4, ""..., 2446)    = 0
17:48:45.323812 read(4, ""..., 2446)     = 0
17:48:45.323937 read(4, ""..., 2446) = 0
17:48:45.324074 read(4, ""..., 2446) = 0
17:48:45.324215 read(4, ""..., 2446)    = 0
17:48:45.324377 read(4, ""..., 2446)     = 0
.
.
. until I restart courier-mta
---------------------------------------------

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
[email protected]
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Reply via email to