-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Philip Hazel wrote: > On Tue, 25 Apr 2006, Michel Meyers wrote: > >> If there's anything I could run to debug it, please let me know (as I >> can reproduce the problem pretty easily here). > > Perhaps compiling Exim with -g and taking a core dump of a stuck process > would help. If it is in a loop, that should indicate where the loop is. > You may have to do special things in order to get a dump of a setuid > process such as Exim - seek advice from Debian experts.
OK, I'm not knowledgeable with debugging stuff at all so I will definitely need help there. >> Had to wait to get home to reproduce the problem, here's the result: >> >> fuser /var/spool/exim4/db/retry.lockfile >> /var/spool/exim4/db/retry.lockfile: 9934 9963 >> >> ps ax | grep 9934 >> 9934 ? R 1:32 /usr/sbin/exim4 -Mc 1FYTF5-0002a2-VT >> 10034 pts/7 R+ 0:00 grep 9934 >> >> ps ax | grep 9963 >> 9963 ? S 0:00 /usr/sbin/exim4 -Mc 1FYTFD-0002aU-BC >> 10060 pts/7 S+ 0:00 grep 9963 >> >> a little later: >> >> fuser /var/spool/exim4/db/retry.lockfile >> /var/spool/exim4/db/retry.lockfile: 9934 >> >> 9934 is the stuck process. > > OK, so it really is an Exim process that has the file open. This is > *very* mysterious, because the code should close the file when it cannot > get a lock. It really is very straightforward: The thing is: The hanging process seems to have gotten a lock and then hung up shortly after not being able to deliver the message (in whatever cleanup it is doing afterwards). Here's a transcript of such an event: 2006-04-26 13:16:31 1FYi0V-0006vp-06 <= [EMAIL PROTECTED] H=localhost [127.0.0.1] P=esmtp S=824 [EMAIL PROTECTED] 2006-04-26 13:16:32 1FYi0V-0006vp-06 == [EMAIL PROTECTED] R=dnslookup T=remote_smtp defer (-44): SMTP error from remote mail server after RCPT TO:<[EMAIL PROTECTED]>: host mail.domain2 [82.149.xx.xx]: 451 GL - temporary problem. Please try again later. After that, the process eats 100% CPU, keeps the lockfile open and is in status 'tidying up after delivering 1FYi0V-0006vp-06'. > The very next statement after writing to the log is to close the file! > This should just exit without updating the retry database. (And I can't > believe it's looping in the log-writing function, though anything is > possible...) > >> 2006-04-25 21:30:58 1FYTFD-0002aU-BC <= [EMAIL PROTECTED] U=Debian-exim >> P=spam-scanned S=2855 [EMAIL PROTECTED] >> 2006-04-25 21:31:58 1FYTFD-0002aU-BC Failed to get write lock for >> /var/spool/exim4/db/retry.lockfile: timed out >> 2006-04-25 21:32:58 1FYTFD-0002aU-BC Failed to get write lock for >> /var/spool/exim4/db/retry.lockfile: timed out >> 2006-04-25 21:32:58 1FYTFD-0002aU-BC => user <[EMAIL PROTECTED]> >> R=local_user T=mail_spool >> >> This time I didn't call 'runq', but I did issue several 'mailq's. > > A 'mailq' should not cause that. Ah, but did that message get stuck in a > loop? Looks like it did manage to do a delivery. Did it complete? If so, > that is what I would expect at one level - it failed to look at the > retry database when routing, and then failed again when trying to update > it at the end. (But we don't know why it couldn't get the lock in the > first place.) No, as mentioned before, I get those error messages for mails that actually get delivered. The message that is stuck is never seen again in the log until I manually kill the hanging process but while it is stuck, the others 'fail to get write lock'. > This is all very weird. Sorry it's taking so long to sort out - but it > must be something unusual otherwise many other hosts would be having the > same trouble. No problem, I'm not in a hurry. I found a similar report (prior to reporting my problem to Debian's BTS) that was never really resolved in bug http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=314356 (the problem seems to just have disappeared there although the submitter mentions having changed HDDs in between). I also found other reports scattered over the web with seemingly similar symptoms but I can't recall the references to those right now. Greetings, Michel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959 iD8DBQFET1kA2Vs+MkscAyURAvlXAKCf42oJO74NK9GbhcAXGTjenfbu2gCeIPlJ kM/N7YP2nUeTR1IeS4VFSs4= =TuXW -----END PGP SIGNATURE----- -- ## List details at http://www.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://www.exim.org/eximwiki/
