Dear gurus,
I run a fairly low-traffic mailman on a stock debian woody server, which suddenly stopped to work. I am clueless and looking for help. Details below
My system:
---------
Debian stable (stock deban woody, regularly updated from security.debian.org)
Debianized stock mailman package v.2.0.11
Debianized stock Exim package: version 3.35 #1 built 07-May-2004 08:25:17
Symptomps:
----------
A few days ago the server suddenly stopped to process incoming messages, they just accumulate in the qfiles subdir. Admin access via web is working, I can add users, etc. No pending mails reported by the web admin gui. Mails are accepted by the MTA w/o complaints, no mail goes out to lists, though. Nothing. The non-mailman-related SMTP traffic flows as normal.
The server has been running for years w/o any real problem. Running out-of-disk-space has happened earlier, but cleaning-up some disk-space has always solved problems.
Below comes the summary of my investigations. I am totally clueless about the problem any help is highly appreciated.
I repeat: the server has worked for years, no (intentional) config changes has happened. There was, however, reports of the server running out-of diskspace by a list-admin, but that has been taken care already.
Zeroth examination: disk space check: ------------------- df -h Filesystem Size Used Avail Use% Mounted on /dev/hdb1 1.9G 1.8G 116M 94% / /dev/hdb5 3.9G 3.6G 163M 96% /home /dev/hdb3 1.9G 1.2G 738M 61% /var /dev/hdb6 3.9G 3.1G 724M 81% /usr/local /dev/hda1 7.6M 5.6M 1.6M 78% /boot /dev/hda2 4.7G 3.2G 1.2G 72% /archives-hda2
Note, there is plenty of disk space in /var.
First examination: SMTP works
-----------------------------
According to the logs exim delivers: just an example from the Exim's mainlog, showing a succesful delivery to mailman-list "nsht":
2004-12-15 08:34:11 1CeTfn-0002e9-00 <= [EMAIL PROTECTED] H=david.tmit.bme.hu [152.66.246.102] P=esmtp S=1865 [EMAIL PROTECTED]
2004-12-15 08:34:12 1CeTfn-0002e9-00 => nsht <[EMAIL PROTECTED]> D=list_director T=list_transport
2004-12-15 08:34:12 1CeTfn-0002e9-00 Completed
Furthermore, I actively use this Exim as my everyday default SMTP MTA, works just fine fine.
Second examination: The messages seems to reach the qfiles directory.
----------------------------------------------------------------------
There are various entries like this:
f0fb10de9b998a5a185~aa29819f1395b9.db size:115 date:Dec 15 23:03
f0fb10de9b998a5a185~a29819f1395b9.msg size:825 date:Dec 15 23:03
The content of a .db file:
leda:/var/lib/mailman/qfiles# cat -vte f0fb10de9b998a5a1858842d62aa29819f1395b9.db
[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]([EMAIL PROTECTED]@[EMAIL PROTECTED]
The content of the .msg file seems normal SMTP envelope and body
The biggest .msg file in this directory is 6656 bytes, therefore disk-free-space cannot be the issue.
Third examination: perms seems to O.K. -------------------------------------- leda:/var/lib/mailman/qfiles# check_perms No problems found
Fourth examination: checking database of the list of the reporting list-admin for list "nsht"
--------------------------------------------------------
leda:/var/lib/mailman/qfiles# check_db nsht
/var/lib/mailman/lists/nsht/config.db is fine
/var/lib/mailman/lists/nsht/config.db.last is fine
Note, that no lists seems to work on the server (there are some tens of lists), neither "nsht" nor others.
Fifth examination: checking crontab for mailman
-----------------------------------------------
leda:/var/lib/mailman/qfiles# cat /etc/cron.d/mailman
12,42 * * * * list [ -x /usr/bin/python -a -f /usr/lib/mailman/cron/run_queue ] && /usr/bin/python /usr/lib/mailman/cron/run_queue
# */5 * * * * list [ -x /usr/bin/python -a -f /usr/lib/mailman/cron/gate_news ] && /usr/bin/python /usr/lib/mailman/cron/gate_news
* * * * * list [ -x /usr/bin/python -a -f /usr/lib/mailman/cron/qrunner ] && /usr/bin/python /usr/lib/mailman/cron/qrunner
Cron daemon is up and running. Qrunner script runs every minutes. See next examination
Sixth examination: checking mailman logs
--------------------------------------------
Everything seems to normal, except that qrunner continually emits errors at each run to /var/lib/mailman/logs/error, such as these:
Dec 16 00:06:02 2004 qrunner(18367): Traceback (most recent call last): Dec 16 00:06:02 2004 qrunner(18367): File "/usr/lib/mailman/cron/qrunner", line 283, in ? Dec 16 00:06:02 2004 qrunner(18367): kids = main(lock) Dec 16 00:06:02 2004 qrunner(18367): File "/usr/lib/mailman/cron/qrunner", line 253, in main Dec 16 00:06:02 2004 qrunner(18367): keepqueued = dispose_message(mlist, msg, msgdata) Dec 16 00:06:02 2004 qrunner(18367): File "/usr/lib/mailman/cron/qrunner", line 121, in dispose_message Dec 16 00:06:02 2004 qrunner(18367): if BouncerAPI.ScanMessages(mlist, mimemsg): Dec 16 00:06:02 2004 qrunner(18367): File "/usr/lib/mailman/Mailman/Bouncers/BouncerAPI.py", line 59, in ScanMessages Dec 16 00:06:02 2004 qrunner(18367): addrs = func(msg) Dec 16 00:06:02 2004 qrunner(18367): File "/usr/lib/mailman/Mailman/Bouncers/Postfix.py", line 39, in process Dec 16 00:06:02 2004 qrunner(18367): more = mfile.next() Dec 16 00:06:02 2004 qrunner(18367): File "/usr/lib/python2.1/multifile.py", line 123, in next Dec 16 00:06:02 2004 qrunner(18367): while self.readline(): pass Dec 16 00:06:02 2004 qrunner(18367): File "/usr/lib/python2.1/multifile.py", line 95, in readline Dec 16 00:06:02 2004 qrunner(18367): if marker == self.section_divider(sep): Dec 16 00:06:02 2004 qrunner(18367): File "/usr/lib/python2.1/multifile.py", line 159, in section_divider Dec 16 00:06:02 2004 qrunner(18367): return "--" + str Dec 16 00:06:02 2004 qrunner(18367): TypeError : cannot add type "None" to string
My attempts to fix the seemingly lock file problem:
--------------------------------------------------
1. Since the reporting list-admin claimed temporary ran-out-of-diskspace situation. I double checked the available free space.
2. I have stopped crond, inetd. I have checked that no python process is lurking around, then I have checked with "lsof" that any of the lock-files in /var/lib/mailman/locks/ are not held open by anyone. All lock files was older than several months(!). I have deleted all lockfiles. Restarted crontab, inetd. Qrunner still fails with the above error log.
3. as a last attempt i have sacrified my 135 days uptime :-( and I have rebooted the system, hoping that the Microsoft approach might help.
The system rebooted just fine, but mailman (qrunner) still does not work.
Now I am out of ideas. Any advice?
Thanks: Gábor
------------------------------------------------------ Mailman-Users mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
