Hi. Thanks for answering.
Mon, 30 Apr 2012 16:15:03 -0700 Mark Sapiro a écrit: > > 2/ Cron/mailmanctl > > > > ps auxww| grep mailmanctl |grep -v grep > > -> Nothing. > > How about > > ps auxww| grep qrunner |grep -v grep Nothing either. > > 7/ Locks > > > > /var/lib/mailman/locks -> /var/lock/mailman > > > > ll /var/lock/mailman > > total 0 > > It appears that some process or person is stopping Mailman. OK. Need to figure out which. > > 8/ Logs > > > > /var/log/mailman/error : > > Apr 30 03:16:21 2012 mailmanctl(11685): No child with pid: 17093 > > Apr 30 03:16:21 2012 mailmanctl(11685): [Errno 3] No such process > > Apr 30 03:16:21 2012 mailmanctl(11685): Stale pid file removed. > > > How about /var/log/mailman/qrunner ? Each day, I have something like this : Apr 28 03:16:33 2012 (17099) OutgoingRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:33 2012 (17094) ArchRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:33 2012 (17097) IncomingRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:33 2012 (17093) Master watcher caught SIGHUP. Re-opening log files. Apr 28 03:16:34 2012 (17095) BounceRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34 2012 (17101) RetryRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34 2012 (17096) CommandRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34 2012 (17098) NewsRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34 2012 (17100) VirginRunner qrunner caught SIGHUP. Reopening logs. The day it stopped, I got this : Apr 29 03:16:29 2012 (17099) OutgoingRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17094) ArchRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17097) IncomingRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17093) Master watcher caught SIGHUP. Re-opening log files. Apr 29 03:16:29 2012 (17097) IncomingRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17099) OutgoingRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17097) IncomingRunner qrunner exiting. Apr 29 03:16:29 2012 (17094) ArchRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17099) OutgoingRunner qrunner exiting. Apr 29 03:16:29 2012 (17094) ArchRunner qrunner exiting. Apr 29 03:16:29 2012 (17096) CommandRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17101) RetryRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17095) BounceRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17098) NewsRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17098) NewsRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17095) BounceRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17096) CommandRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17101) RetryRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17100) VirginRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17096) CommandRunner qrunner exiting. Apr 29 03:16:29 2012 (17098) NewsRunner qrunner exiting. Apr 29 03:16:29 2012 (17095) BounceRunner qrunner exiting. Apr 29 03:16:29 2012 (17100) VirginRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17101) RetryRunner qrunner exiting. Apr 29 03:16:29 2012 (17100) VirginRunner qrunner exiting. Sorry for the mess, here. But I think you get the idea. Seems to happen during a cron job. Bug reports that could be related : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=505638 https://bugs.launchpad.net/mailman/+bug/265855 > > modified > > /var/lib/mailman/Mailman/Handlers/SMTPDirect.py > > to add > > self.__conn.set_debuglevel(1) > > And yet you are not logging any smtp debugging in Mailman's error log. > There should be copious log information for every outgoing message. There was. But it stopped. Last message for which I do have a lot of info is on Apr 22, one week before mailman stopped sending messages. -rw-rw-r-- 1 list list 198 Apr 30 03:16 /var/log/mailman/error -rw-rw-r-- 1 list list 0 Apr 22 03:16 /var/log/mailman/error.1 -rw-rw-r-- 1 list list 0 Apr 15 03:16 /var/log/mailman/error.2 -rw-rw-r-- 1 list list 36541617 Apr 22 01:59 /var/log/mailman/error.3 Should there be anything relevant in there ? > > Configuration > > ------------- > > > > Not sure this is useful, but > > /etc/mailman/mm_cfg.py contains > > MTA='LocalPostfix' > > The above line should cause significant problems when attempting to > create or remove lists. it MUST be one of > > MTA = 'Postfix' > MTA = 'Manual' > MTA = None > > 'Postfix' means generate aliases and virtual-mailman files for Postfix. > 'Manual' means display the necessary aliases > None means don't do anything with aliases when lists are created/removed. I configured mailman 3 years ago. I don't remember everything but it comes from here : http://isp-control.net/documentation/howto/mail/setup_mailman Is it such a bad idea ? I suppose it is unrelated, anyway. Good thing is there is a relatively recent bug opened on debian that might be closed if we managed to rootcause and solve this. I just did a little bit of cleanup tonight, after I realized the server was almost full. At least the partition that hosts mailman queues and logs. Would we see something specific in case of lack of space ? Thank you for your help. -- Jérôme ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org