-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Philip Hazel wrote:
> On Fri, 21 Apr 2006, Andreas Metzler wrote:
>
 > I wonder how we can track this down. There must be something different
> about Michel's system, because nobody else is reporting this, and there
> must be many cases of this kind of retrying happening to lots of people.

If there's anything I could run to debug it, please let me know (as I
can reproduce the problem pretty easily here).

>>> The given log had this:
>>>> 2006-04-04 09:13:48 1FQfhL-00035E-Ay Failed to get write lock for
>>>> /var/spool/exim4/db/retry.lockfile: timed out
>>>> 2006-04-04 09:14:48 1FQfhL-00035E-Ay Failed to get write lock for
>>>> /var/spool/exim4/db/retry.lockfile: timed out
>>> which suggests two tries for the same message, one minute apart. How
>>> often was the OP starting queue runners?
>> <the usual -q30m>
>
> Hmm. So why are there those two messages, I wonder?

Don't get too hung up on them. I do not recall the exact circumstances
of when those were generated (I might have called runq manually at the
time).

>> Note that I get those for mails that are not stuck
>
> They should just be getting read locks (and the message is wrong, as per
> the bug I found), but why are they failing? I guess the next question is
> what DBM library is in use?

I guess you mean libdb4.2 (package rev 4.2.52-23.1 is installed)?

> What kind of file system is used for
> /var/spool/exim4? I'm grasping at straws here.

/var is ext3

>> 2006-04-20 21:06:35 1FWeOO-000396-DP Spool file is locked (another
>> process is handling this message)
>
> At least *some* locking is working. :-)
>
> Does the OP have any kind of tool for looking at open files to see what
> process is using them? For example, fuser? The output of
>
> fuser /var/spool/exim4/db/retry.lockfile
>
> might be helpful.

Had to wait to get home to reproduce the problem, here's the result:

fuser /var/spool/exim4/db/retry.lockfile
/var/spool/exim4/db/retry.lockfile:  9934  9963

  ps ax | grep 9934
  9934 ?        R      1:32 /usr/sbin/exim4 -Mc 1FYTF5-0002a2-VT
10034 pts/7    R+     0:00 grep 9934

  ps ax | grep 9963
  9963 ?        S      0:00 /usr/sbin/exim4 -Mc 1FYTFD-0002aU-BC
10060 pts/7    S+     0:00 grep 9963

a little later:

fuser /var/spool/exim4/db/retry.lockfile
/var/spool/exim4/db/retry.lockfile:  9934

9934 is the stuck process. The other one was a normal message that got
delivered.

2006-04-25 21:30:58 1FYTFD-0002aU-BC <= [EMAIL PROTECTED] U=Debian-exim
P=spam-scanned S=2855 [EMAIL PROTECTED]
2006-04-25 21:31:58 1FYTFD-0002aU-BC Failed to get write lock for
/var/spool/exim4/db/retry.lockfile: timed out
2006-04-25 21:32:58 1FYTFD-0002aU-BC Failed to get write lock for
/var/spool/exim4/db/retry.lockfile: timed out
2006-04-25 21:32:58 1FYTFD-0002aU-BC => user <[EMAIL PROTECTED]>
R=local_user T=mail_spool

This time I didn't call 'runq', but I did issue several 'mailq's.

Greetings,
        Michel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959

iD8DBQFETnuu2Vs+MkscAyURAqrEAJ46ZC1A+gTpUvPvGACmX7STvbkSUACg5z0G
rSRcJyoSgJYsap7r987JOms=
=O4gD
-----END PGP SIGNATURE-----

-- 
## List details at http://www.exim.org/mailman/listinfo/exim-users 
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://www.exim.org/eximwiki/

Reply via email to