A BUGNOTE has been added to this bug. ====================================================================== http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000162 ====================================================================== Reported By: xing Assigned To: ====================================================================== Project: DBMail Bug ID: 162 Category: POP3 daemon Reproducibility: always Severity: major Priority: normal Status: new ====================================================================== Date Submitted: 18-Jan-05 01:44 CET Last Modified: 31-Jan-05 08:12 CET ====================================================================== Summary: dbmail-pop3d zombies galore.. Description: Belive this problem started with 2.0.3
dbmail-pop3d is creating a bunch of dbmail-pop3d zombie proceses that must be killed via kill -9 switch. I see a lot of the following in my mail log. serverchild.c,CreateChild: child_register failed Jan 17 16:29:16 mail dbmail/pop3d[19630]: serverchild.c,CreateChild: child_register failed as shown in ps: 19624 ? Z 0:00 [dbmail-pop3d] <defunct> 19625 ? Z 0:00 [dbmail-pop3d] <defunct> 19626 ? Z 0:00 [dbmail-pop3d] <defunct> I have 144 of these zombies at this very moment even though I just killed them all and restarted pop3d daemon a minute ago. Important Note: Setting trace=5 for pop3d ALLEVIATES the problem! Thus I cannot provide trace info here. Weird. I have duplicated this many times on my end before submitting this report. Here is my relevant dbmail.conf entires: [DBMAIL] # Database settings host=localhost user=postfix pass=postfix db=dbmail sqlsocket=/tmp/mysql.sock # trace level for dbmail-maintenance TRACE_LEVEL=1 [POP] EFFECTIVE_USER=postfix # the user that dbmail-pop3d will run as (need to be root to bind to a port<1024) EFFECTIVE_GROUP=postfix # the group that dbmail-pop3d will run as BINDIP=* # the ipaddress the dbmail-pop3d server has to bind to, * for all addresses PORT=110 # the port number the dbmail-pop3d server has to bind to. NCHILDREN=5 # default number of POP3 handlers (each is a process) MAXCHILDREN=20 # mac. number of POP3 handlers MAXCONNECTS=10000 # the maximum number of connections a default childs makes TIMEOUT=31 # the time (s) before the dbmail-pop3d should shutdown a connection which is being idle. RESOLVE_IP=no # if yes, the pop daemon resolves IP numbers to DNS names in the log POP_BEFORE_SMTP=no TRACE_LEVEL=1 ====================================================================== ---------------------------------------------------------------------- paul - 18-Jan-05 09:25 CET ---------------------------------------------------------------------- Xing, I recently changed the manage_stop_children code to fix bug http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000158. Could you please test the current 2.0 cvs code to check if that also helps in your case? ---------------------------------------------------------------------- xing - 18-Jan-05 11:47 CET ---------------------------------------------------------------------- Checked out the CVS branch and still have the exact same problem. Again the weird thing here is that the bug is completedly gone, when trace is set to 5 for pop daemon in dbmail.conf. My only theory based on the trace level difference is perhaps the trace=5 produces noticeable "delays" between thread/process forking which allow the system to work? Without the verbose trace, the server is trying to spawn way too fast? Just a wild guess. Extra info: I can reproduce this bug with trace=1 almost immediately upon pop3d startup each time. However, sometimes, the startup would be fine but after 3-5 minutes, all the childs get unregistered and the registering/failed attempts create the same zombie pool. So the problem not only related to startup. edited on: 18-Jan-05 11:47 ---------------------------------------------------------------------- sersop - 26-Jan-05 11:39 CET ---------------------------------------------------------------------- the same problem for dbmail-pop3d and dbmail-lmtpd on high load system Fedora Core 2 Linux 2.6.10 http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000001 SMP Mon Jan 24 14:01:32 YEKT 2005 i686 i686 i386 GNU/Linux ---------------------------------------------------------------------- xing - 31-Jan-05 08:12 CET ---------------------------------------------------------------------- Running the trace=5 workdaround has so far eliminated the pop3d errors for the past week but today my 2.0.3 dbmail-pop3d servers completedly locked up. It will not accept any new connections yet it is running. I feel this is related to the zombie problem as far as the server thread starting and killing child processes. Bug History Date Modified Username Field Change ====================================================================== 18-Jan-05 01:44xing New Bug 18-Jan-05 09:25paul Bugnote Added: 0000539 18-Jan-05 11:42xing Bugnote Added: 0000540 18-Jan-05 11:47xing Bugnote Edited: 0000540 26-Jan-05 11:39sersop Bugnote Added: 0000569 31-Jan-05 08:12xing Bugnote Added: 0000572 ======================================================================