A BUGNOTE has been added to this bug. ====================================================================== http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000162 ====================================================================== Reported By: xing Assigned To: ====================================================================== Project: DBMail Bug ID: 162 Category: POP3 daemon Reproducibility: always Severity: major Priority: normal Status: new ====================================================================== Date Submitted: 18-Jan-05 01:44 CET Last Modified: 08-Apr-05 04:12 CEST ====================================================================== Summary: dbmail-pop3d zombies galore.. Description: Belive this problem started with 2.0.3
dbmail-pop3d is creating a bunch of dbmail-pop3d zombie proceses that must be killed via kill -9 switch. I see a lot of the following in my mail log. serverchild.c,CreateChild: child_register failed Jan 17 16:29:16 mail dbmail/pop3d[19630]: serverchild.c,CreateChild: child_register failed as shown in ps: 19624 ? Z 0:00 [dbmail-pop3d] <defunct> 19625 ? Z 0:00 [dbmail-pop3d] <defunct> 19626 ? Z 0:00 [dbmail-pop3d] <defunct> I have 144 of these zombies at this very moment even though I just killed them all and restarted pop3d daemon a minute ago. Important Note: Setting trace=5 for pop3d ALLEVIATES the problem! Thus I cannot provide trace info here. Weird. I have duplicated this many times on my end before submitting this report. Here is my relevant dbmail.conf entires: [DBMAIL] # Database settings host=localhost user=postfix pass=postfix db=dbmail sqlsocket=/tmp/mysql.sock # trace level for dbmail-maintenance TRACE_LEVEL=1 [POP] EFFECTIVE_USER=postfix # the user that dbmail-pop3d will run as (need to be root to bind to a port<1024) EFFECTIVE_GROUP=postfix # the group that dbmail-pop3d will run as BINDIP=* # the ipaddress the dbmail-pop3d server has to bind to, * for all addresses PORT=110 # the port number the dbmail-pop3d server has to bind to. NCHILDREN=5 # default number of POP3 handlers (each is a process) MAXCHILDREN=20 # mac. number of POP3 handlers MAXCONNECTS=10000 # the maximum number of connections a default childs makes TIMEOUT=31 # the time (s) before the dbmail-pop3d should shutdown a connection which is being idle. RESOLVE_IP=no # if yes, the pop daemon resolves IP numbers to DNS names in the log POP_BEFORE_SMTP=no TRACE_LEVEL=1 ====================================================================== ---------------------------------------------------------------------- paul - 18-Jan-05 09:25 CET ---------------------------------------------------------------------- Xing, I recently changed the manage_stop_children code to fix bug http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000158. Could you please test the current 2.0 cvs code to check if that also helps in your case? ---------------------------------------------------------------------- xing - 18-Jan-05 11:47 CET ---------------------------------------------------------------------- Checked out the CVS branch and still have the exact same problem. Again the weird thing here is that the bug is completedly gone, when trace is set to 5 for pop daemon in dbmail.conf. My only theory based on the trace level difference is perhaps the trace=5 produces noticeable "delays" between thread/process forking which allow the system to work? Without the verbose trace, the server is trying to spawn way too fast? Just a wild guess. Extra info: I can reproduce this bug with trace=1 almost immediately upon pop3d startup each time. However, sometimes, the startup would be fine but after 3-5 minutes, all the childs get unregistered and the registering/failed attempts create the same zombie pool. So the problem not only related to startup. edited on: 18-Jan-05 11:47 ---------------------------------------------------------------------- sersop - 26-Jan-05 11:39 CET ---------------------------------------------------------------------- the same problem for dbmail-pop3d and dbmail-lmtpd on high load system Fedora Core 2 Linux 2.6.10 http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000001 SMP Mon Jan 24 14:01:32 YEKT 2005 i686 i686 i386 GNU/Linux ---------------------------------------------------------------------- xing - 31-Jan-05 08:12 CET ---------------------------------------------------------------------- Running the trace=5 workdaround has so far eliminated the pop3d errors for the past week but today my 2.0.3 dbmail-pop3d servers completedly locked up. It will not accept any new connections yet it is running. I feel this is related to the zombie problem as far as the server thread starting and killing child processes. ---------------------------------------------------------------------- paul - 28-Mar-05 12:52 CEST ---------------------------------------------------------------------- Just an idea: I don't see any MINSPARECHILDREN/MAXSPARECHILDREN settings in your config. Not that such should really matter, but please try if that makes a difference... ---------------------------------------------------------------------- xing - 08-Apr-05 04:12 CEST ---------------------------------------------------------------------- Paul, I had been running with trace=2 for both pop/imap daemons to avoid the zombie problem. For whatever reason, the extra logging stopped the runaway processe. Just tried your advice of adding: MINSPARECHILDREN=2 MAXSPARECHILDREN=4 to both my imap/pop confg lines in dbmail.conf and so far it's has been running zombie free on trace=1 for 48 hours. Can't say it's fixed for sure but looks like it. The zombie problem usually manifest itself within minutes under high load. Bug History Date Modified Username Field Change ====================================================================== 18-Jan-05 01:44xing New Bug 18-Jan-05 09:25paul Bugnote Added: 0000539 18-Jan-05 11:42xing Bugnote Added: 0000540 18-Jan-05 11:47xing Bugnote Edited: 0000540 26-Jan-05 11:39sersop Bugnote Added: 0000569 31-Jan-05 08:12xing Bugnote Added: 0000572 28-Mar-05 12:52paul Bugnote Added: 0000637 08-Apr-05 04:12xing Bugnote Added: 0000653 ======================================================================