A BUGNOTE has been added to this bug.
======================================================================
http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000162
======================================================================
Reported By:                xing
Assigned To:                
======================================================================
Project:                    DBMail
Bug ID:                     162
Category:                   POP3 daemon
Reproducibility:            always
Severity:                   major
Priority:                   normal
Status:                     new
======================================================================
Date Submitted:             18-Jan-05 01:44 CET
Last Modified:              08-Apr-05 04:12 CEST
======================================================================
Summary:                    dbmail-pop3d zombies galore..
Description: 
Belive this problem started with 2.0.3

dbmail-pop3d is creating a bunch of dbmail-pop3d zombie proceses that must
be killed via kill -9 switch.

I see a lot of the following in my mail log. 

serverchild.c,CreateChild: child_register failed
Jan 17 16:29:16 mail dbmail/pop3d[19630]: serverchild.c,CreateChild:
child_register failed

as shown in ps:

19624 ?        Z      0:00 [dbmail-pop3d] <defunct>
19625 ?        Z      0:00 [dbmail-pop3d] <defunct>
19626 ?        Z      0:00 [dbmail-pop3d] <defunct>

I have 144 of these zombies at this very moment even though I just killed
them all and restarted pop3d daemon a minute ago.

Important Note: Setting trace=5 for pop3d ALLEVIATES the problem! Thus I
cannot provide trace info here. Weird. I have duplicated this many times
on my end before submitting this report.

Here is my relevant dbmail.conf entires:
[DBMAIL]
# Database settings
host=localhost
user=postfix
pass=postfix
db=dbmail
sqlsocket=/tmp/mysql.sock
# trace level for dbmail-maintenance
TRACE_LEVEL=1


[POP]
EFFECTIVE_USER=postfix            # the user that dbmail-pop3d will run as
(need to be root to bind to a port<1024)
EFFECTIVE_GROUP=postfix           # the group that dbmail-pop3d will run
as
BINDIP=*                          # the ipaddress the dbmail-pop3d server
has to bind to, * for all addresses
PORT=110                          # the port number the dbmail-pop3d
server has to bind to.
NCHILDREN=5                       # default number of POP3 handlers (each
is a process)
MAXCHILDREN=20                    # mac. number of POP3 handlers
MAXCONNECTS=10000                 # the maximum number of connections a
default childs makes
TIMEOUT=31                        # the time (s) before the dbmail-pop3d
should shutdown a connection which is being idle.
RESOLVE_IP=no                    # if yes, the pop daemon resolves IP
numbers to DNS names in the log
POP_BEFORE_SMTP=no
TRACE_LEVEL=1




======================================================================

----------------------------------------------------------------------
 paul - 18-Jan-05 09:25 CET 
----------------------------------------------------------------------
Xing,

I recently changed the manage_stop_children code to fix bug 
http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000158. Could
you please test the current 2.0 cvs code to check if that also helps in
your case?

----------------------------------------------------------------------
 xing - 18-Jan-05 11:47 CET 
----------------------------------------------------------------------
Checked out the CVS branch and still have the exact same problem.

Again the weird thing here is that the bug is completedly gone, when trace
is set to 5 for pop daemon in dbmail.conf. 

My only theory based on the trace level difference is perhaps the trace=5
produces noticeable "delays" between thread/process forking which allow
the system to work? Without the verbose trace, the server is trying to
spawn way too fast? Just a wild guess.

Extra info:

I can reproduce this bug with trace=1 almost immediately upon pop3d
startup each time. However, sometimes, the startup would be fine but after
3-5 minutes, all the childs get unregistered and the registering/failed
attempts create the same zombie pool. So the problem not only related to
startup.

edited on: 18-Jan-05 11:47

----------------------------------------------------------------------
 sersop - 26-Jan-05 11:39 CET 
----------------------------------------------------------------------
the same problem for dbmail-pop3d and dbmail-lmtpd on high load system

Fedora Core 2
Linux  2.6.10 
http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000001 SMP Mon 
Jan 24 14:01:32 YEKT 2005 i686 i686 i386
GNU/Linux

----------------------------------------------------------------------
 xing - 31-Jan-05 08:12 CET 
----------------------------------------------------------------------
Running the trace=5 workdaround has so far eliminated the pop3d errors for
the past week but today my 2.0.3 dbmail-pop3d servers completedly locked
up. It will not accept any new connections yet it is running. I feel this
is related to the zombie problem as far as the server thread starting and
killing child processes.

----------------------------------------------------------------------
 paul - 28-Mar-05 12:52 CEST 
----------------------------------------------------------------------
Just an idea: I don't see any MINSPARECHILDREN/MAXSPARECHILDREN settings in
your config. 

Not that such should really matter, but please try if that makes a
difference...

----------------------------------------------------------------------
 xing - 08-Apr-05 04:12 CEST 
----------------------------------------------------------------------
Paul,

I had been running with trace=2 for both pop/imap daemons to avoid the
zombie problem. For whatever reason, the extra logging stopped the runaway
processe. Just tried your advice of adding:

MINSPARECHILDREN=2
MAXSPARECHILDREN=4

to both my imap/pop confg lines in dbmail.conf and so far it's has been
running zombie free on trace=1 for 48 hours. Can't say it's fixed for sure
but looks like it. The zombie problem usually manifest itself within
minutes under high load.

Bug History
Date Modified  Username       Field                    Change              
======================================================================
18-Jan-05 01:44xing           New Bug                                      
18-Jan-05 09:25paul           Bugnote Added: 0000539                       
18-Jan-05 11:42xing           Bugnote Added: 0000540                       
18-Jan-05 11:47xing           Bugnote Edited: 0000540                      
26-Jan-05 11:39sersop         Bugnote Added: 0000569                       
31-Jan-05 08:12xing           Bugnote Added: 0000572                       
28-Mar-05 12:52paul           Bugnote Added: 0000637                       
08-Apr-05 04:12xing           Bugnote Added: 0000653                       
======================================================================

Reply via email to