On Sat, 15 Apr 2006, Andreas Metzler wrote: > gets stuck with 100% CPU usage and the only way to get rid of it is to > kill it with signal 9. While the stuck process is there, the mainlog > keeps mentioning messages like these: > 2006-04-04 09:13:33 1FQfgD-0002kv-EZ Failed to get write lock for > /var/spool/exim4/db/retry.lockfile: timed out
> Here's an example log entry of a message getting rejected (causing the > process to go to 100% CPU): > 2006-04-04 09:31:40 1FQ2wq-0005Kt-8R == [EMAIL PROTECTED] R=dnslookup > T=remote_smtp defer (-44): SMTP error from remote mail server after RCPT > TO:<[EMAIL PROTECTED]>: host mail.removed.de [xx.xxx.xxx.xx]: 451 GL - > temporary problem. Please try again later. > I am a little bit at loss on how to debug this, upon asking the > submitter told us that the stuck process is listed as > | 31441 tidying up after delivering 1FT0NS-0008AT-0D > by exiwhat. According to google ther have been similar reports on > exim-users, none of which ended with a definitive solution. I found one bug when I first looked at this, but it isn't a processing bug. It is just that it would always say "Failed to get write lock", even when the failure was for a read lock. That was easily fixed. I tried to simulate this problem by patching the code to pretend it had failed to get a lock when trying to update the retry database while tidying up after a 451 failure. (It is, in fact, a write lock here.) Needless to say, I did not get a 100% loop. It just did what it is supposed to do - that is, failed to update the hints. But of course I was using release 4.61, not 4.60. I suppose we'll have to look at the configuration that was being used. The given log had this: > 2006-04-04 09:13:48 1FQfhL-00035E-Ay Failed to get write lock for > /var/spool/exim4/db/retry.lockfile: timed out > 2006-04-04 09:14:48 1FQfhL-00035E-Ay Failed to get write lock for > /var/spool/exim4/db/retry.lockfile: timed out which suggests two tries for the same message, one minute apart. How often was the OP starting queue runners? I have a feeling this is going to be a long haul... Philip -- Philip Hazel, University of Cambridge Computing Service. -- ## List details at http://www.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://www.exim.org/eximwiki/
