[Dbmail-dev] [DBMail 0000361]: IMAP zombies after about a day.

bugtrack Mon, 3 Jul 2006 12:59:14 +0200 (CEST)

A NOTE has been added to this issue. 
====================================================================== 
http://www.dbmail.org/mantis/view.php?id=361 
====================================================================== 
Reported By:                blake
Assigned To:                paul
====================================================================== 
Project:                    DBMail
Issue ID:                   361
Category:                   IMAP daemon
Reproducibility:            always
Severity:                   minor
Priority:                   normal
Status:                     assigned
target:                      
====================================================================== 
Date Submitted:             09-Jun-06 01:36 CEST
Last Modified:              03-Jul-06 12:59 CEST
====================================================================== 
Summary:                    IMAP zombies after about a day.
Description: 
After starting up dbmail-imapd, I start to see zombies after about a day's
use. I am the only regular user of the server, with the current
Thunderbird client. Usually when it happens, I'll notice some weird laggy
behavior on the client, check the server, and sure enough there will be
one or more zombies. Here is a level 5 trace from the last command a child
handled before becomign a zombie:


Jun  8 19:04:32 colo dbmail/imap4d[7313]: COMMAND: [6 UID fetch 1332440
(UID RFC822.SIZE BODY.PEEK[])]
Jun  8 19:04:32 colo dbmail/imap4d[7313]: arg[0]: 'fetch'
Jun  8 19:04:32 colo dbmail/imap4d[7313]: arg[1]: '1332440'
Jun  8 19:04:32 colo dbmail/imap4d[7313]: arg[2]: '('
Jun  8 19:04:32 colo dbmail/imap4d[7313]: arg[3]: 'UID'
Jun  8 19:04:32 colo dbmail/imap4d[7313]: arg[4]: 'RFC822.SIZE'
Jun  8 19:04:32 colo dbmail/imap4d[7313]: arg[5]: 'BODY.PEEK'
Jun  8 19:04:32 colo dbmail/imap4d[7313]: arg[6]: '['
Jun  8 19:04:32 colo dbmail/imap4d[7313]: arg[7]: ']'
Jun  8 19:04:32 colo dbmail/imap4d[7313]: arg[8]: ')'
Jun  8 19:04:32 colo dbmail/imap4d[7313]: imap4.c,IMAPClientHandler:
Executing command uid...
Jun  8 19:04:32 colo dbmail/imap4d[7313]: db.c,db_acl_has_right: checking
ACL [read_flag] for user [1] on mailbox [5]
Jun  8 19:04:32 colo dbmail/imap4d[7313]: db.c, db_acl_has_right: mailbox
[5] is owned by user [1], is that also [1]?
Jun  8 19:04:32 colo dbmail/imap4d[7313]: db.c, db_acl_has_right: mailbox
[5] is owned by user [1], giving all rights
Jun  8 19:04:32 colo dbmail/imap4d[7313]:
dbmail-imapsession.c,dbmail_imap_session_fetch_parse_args: parse args[2] =
[UID]
Jun  8 19:04:32 colo dbmail/imap4d[7313]:
dbmail-imapsession.c,dbmail_imap_session_fetch_parse_args: args[idx = 2] =
UID (returning 3)
Jun  8 19:04:32 colo dbmail/imap4d[7313]:
dbmail-imapsession.c,dbmail_imap_session_fetch_parse_args: parse args[3] =
[RFC822.SIZE]
Jun  8 19:04:32 colo dbmail/imap4d[7313]:
dbmail-imapsession.c,dbmail_imap_session_fetch_parse_args: args[idx = 3] =
RFC822.SIZE (returning 4)
Jun  8 19:04:32 colo dbmail/imap4d[7313]:
dbmail-imapsession.c,dbmail_imap_session_fetch_parse_args: parse args[4] =
[BODY.PEEK]
Jun  8 19:04:32 colo dbmail/imap4d[7313]:
dbmail-imapsession.c,_imap_session_fetch_parse_octet_range: parse token
[)]
Jun  8 19:04:32 colo dbmail/imap4d[7313]:
dbmail-imapsession.c,dbmail_imap_session_fetch_parse_args: parse args[7] =
[)]
Jun  8 19:04:32 colo dbmail/imap4d[7313]: imapcommands.c,_ic_fetch:
fetch_start [1332440] fetch_end [1332440]
Jun  8 19:04:32 colo dbmail/imap4d[7313]: dbmysql.c,db_query: query
[SELECT seen_flag, answered_flag, deleted_flag, flagged_flag, draft_flag,
recent_flag, DATE_FORMAT(internal_date, '%Y-%m-%d %T'), rfcsize,
message_idnr FROM dbmail_messages msg, dbmail_physmessage pm WHERE pm.id =
msg.physmessage_id AND message_idnr BETWEEN '1332440' AND '1332440' AND
mailbox_idnr = '5' AND status IN ('0','1') ORDER BY message_idnr ASC]
Jun  8 19:04:32 colo dbmail/imap4d[7313]: RESPONSE: [* 1981 FETCH (]
Jun  8 19:04:32 colo dbmail/imap4d[7313]: RESPONSE: [RFC822.SIZE 5541]
Jun  8 19:04:32 colo dbmail/imap4d[7313]: RESPONSE: [ ]
Jun  8 19:04:32 colo dbmail/imap4d[7313]: RESPONSE: [UID 1332440]
Jun  8 19:04:32 colo dbmail/imap4d[7313]: RESPONSE: [ ]
Jun  8 19:04:32 colo dbmail/imap4d[7313]:
dbmail-imapsession.c,_imap_cache_update: cache message [1332440] filter
[1]
Jun  8 19:04:32 colo dbmail/imap4d[7313]: dbmysql.c,db_query: query
[SELECT physmessage_id FROM dbmail_messages WHERE message_idnr =
'1332440']
Jun  8 19:04:32 colo dbmail/imap4d[7313]: dbmysql.c,db_query: query
[SELECT messageblk FROM dbmail_messageblks WHERE physmessage_id = '810148'
ORDER BY messageblk_idnr]
Jun  8 19:04:32 colo dbmail/imap4d[7313]:
dbmail-message.c,_set_content_from_stream: parse message
Jun  8 19:04:32 colo dbmail/imap4d[7313]:
dbmail-imapsession.c,_imap_cache_update: cache size [1088]
Jun  8 19:04:32 colo dbmail/imap4d[7313]: RESPONSE: [BODY[] {1088}  ]
Jun  8 19:04:32 colo dbmail/imap4d[7313]: RESPONSE: [)  ]
Jun  8 19:04:32 colo dbmail/imap4d[7313]: RESPONSE: [6 OK UID FETCH
completed  ]
Jun  8 19:04:32 colo dbmail/imap4d[7313]: imap4.c,IMAPClientHandler:
Finished command uid [0]
Jun  8 19:09:33 colo dbmail/imap4d[7313]:
dbmail-imapsession.c,dbmail_imap_session_readln: error reading from
client
Jun  8 19:09:33 colo dbmail/imap4d[7313]: imap4.c,IMAPClientHandler: error
reading command -- bailing out
Jun  8 19:09:33 colo dbmail/imap4d[7313]:
dbmail-imapsession.c,dbmail_imap_session_printf: write to client socket
failed


FWIW: I also noticed this behavior with 2.0.10, but only used that for a
short time before upgrading to trunk.
======================================================================
Relationships       ID      Summary
----------------------------------------------------------------------
related to          0000370 waitpid() in ParentSigHander() function...
related to          0000363 Somtimes the count of grandchild proces...
====================================================================== 

---------------------------------------------------------------------- 
 aaron - 09-Jun-06 17:57  
---------------------------------------------------------------------- 
Please tell a little more about what's going on. Do all of the processes
zombie? Only the parent? Only the middle? Only the children?

Like this output from ps (trimmed a little)

parent:
    root      9809     1    /usr/local/sbin/dbmail-lmtpd
middle:
    nobody    9811  9809    /usr/local/sbin/dbmail-lmtpd
children:
    nobody    9816  9811    /usr/local/sbin/dbmail-lmtpd
    nobody    9818  9811    /usr/local/sbin/dbmail-lmtpd 

---------------------------------------------------------------------- 
 aaron - 09-Jun-06 18:09  
---------------------------------------------------------------------- 
Ok, I think I found it. dbmail_imap_session_printf has a TRACE_FATAL when
"write to client socket failed". I changed them to TRACE_ERROR with return
code of -2 (unrecoverable errors) -- although callers clean up and die on
return code -1, too, I figure it might be useful down the road to know
what happened.

Please test current SVN.

Now that should send a SIGCHLD to the parent process letting it know that
the child died. For some reason that SIGCHLD is probably not getting
through, and the child process remains a zombie until its status is
reported to the parent. Search for 'zombie' on this page for some good
reading:

http://www.win.tue.nl/~aeb/linux/lk/lk-5.html 

---------------------------------------------------------------------- 
 aaron - 11-Jun-06 02:34  
---------------------------------------------------------------------- 
I think I got this. 

---------------------------------------------------------------------- 
 aaron - 13-Jun-06 08:55  
---------------------------------------------------------------------- 
Wait just a minute. child_unregister is called with a child process exits,
and it sets the STATE_WAIT flag so that the parent can reap the return
process. Except I can't find the code that looks for STATE_WAIT children!

In fact, reap_child should probably be recoded to use the same method as
child_unregister -- a variant that takes a pid as an argument and gives it
STATE_WAIT status. reap_child would kill the process, then mark the
scoreboard entry as STATE_WAIT. The next call to manage_spare_children (or
a new function, like manage_zombie_children ;-) would reap all STATE_WAIT
statuses. 

---------------------------------------------------------------------- 
 paul - 03-Jul-06 12:55  
---------------------------------------------------------------------- 
I'm reopening this bug because it's still unresolved. I'm working on
cleaning up the pooling code. 

---------------------------------------------------------------------- 
 paul - 03-Jul-06 12:59  
---------------------------------------------------------------------- 
I've committed a cleanup of the preforking code that seems to address this
issue. I've been doing some heavy load-tests on this code, and it now
appears to hold up to it's task. 

Please test svn-trunk 

Issue History 
Date Modified   Username       Field                    Change               
====================================================================== 
09-Jun-06 01:36 blake          New Issue                                    
09-Jun-06 17:57 aaron          Note Added: 0001236                          
09-Jun-06 18:09 aaron          Note Added: 0001237                          
11-Jun-06 02:34 aaron          Status                   new => resolved     
11-Jun-06 02:34 aaron          Fixed in Version          => SVN Trunk       
11-Jun-06 02:34 aaron          Resolution               open => fixed       
11-Jun-06 02:34 aaron          Assigned To               => aaron           
11-Jun-06 02:34 aaron          Note Added: 0001240                          
12-Jun-06 18:11 aaron          Relationship added       related to 0000363  
13-Jun-06 08:55 aaron          Note Added: 0001248                          
03-Jul-06 12:55 paul           Note Added: 0001290                          
03-Jul-06 12:55 paul           Resolution               fixed => reopened   
03-Jul-06 12:56 paul           Assigned To              aaron => paul       
03-Jul-06 12:56 paul           Status                   resolved => assigned
03-Jul-06 12:56 paul           Relationship added       related to 0000370  
03-Jul-06 12:59 paul           Note Added: 0001291                          
======================================================================

[Dbmail-dev] [DBMail 0000361]: IMAP zombies after about a day.

Reply via email to