Solaris-x86-10u8
Dovecot-1.2.7
Large mail setup, with quite a few MX servers working well. Generally everything
is super.
But every now and then, we get "bouts of bad weather" - which only lasts a short
time, and curiously nearly always occur when mail is sent to customer mailing
lists. I suspect mostly because of that 1 message has multiple recipients within
the same domain. Perhaps.
Delivery setup as:
main.cf:
virtual_transport = dovecot
dovecot_destination_recipient_limit = 1
dovecot_destination_concurrency_limit=300
master.cf:
dovecot unix - n n - 70 pipe
flags=DRhu user=dovecot:dovecot argv=/usr/local/libexec/dovecot/deliver -f
${sender} -d ${recipient}
Errors look something like:
example.com
Dec 06 08:46:25 deliver([email protected]): Info:
msgid=<[email protected]>: saved mail to INBOX
Dec 06 08:46:25 deliver([email protected]): Info:
msgid=<[email protected]>: saved mail to INBOX
Dec 06 08:46:28 deliver([email protected]): Error: userdb
lookup([email protected]) failed: Internal failure
Dec 06 08:46:28 deliver([email protected]): Error: userdb
lookup([email protected]) failed: Internal failure
Dec 06 08:46:28 deliver([email protected]): Error: userdb
lookup([email protected]) failed: Internal failure
Dec 06 08:46:28 deliver([email protected]): Error: userdb
lookup([email protected]) failed: Internal failure
Later on, when it retries, everything goes according to plan and delivery is
achieved.
Now this is actually due to slapd saying:
Dec 6 08:46:25 vmx15.unix slapd[3958]: [ID 763815 local4.debug]
connection_input: conn=72057 deferring operation: too many executing
And since it happens most frequently with mailing-lists, or mails with many
recipients, I would guess it is due to a large number of lookups happening in a
very short time.
Since dovecot_destination_recipient_limit=1, I believe 'deliver' is only ever
called with just one recipient for "-d", and 'deliver' probably-does-not (?)
query LDAP for any of the other "To:" addresses in the message body. Is that the
case?
Secondary, the dovecot-ldap.conf for dovecot-auth has:
hosts = 127.0.0.1 172.20.12.33 172.20.12.23 172.20.12.113
So even though localhost's slapd was busy at the time, the other three hosts
were definitely not. Is LDAP fail-over ... failing... in this case? How many
concurrent queries does dovecot-auth perform? Any way to tweak this value?
Admittedly, postfix/dovecot does handle this situation correctly, as 'temporary
failure' and mail delivery is merely delayed. But at the same time, it *could*
be something with an easy fix as well.
Thoughts?
Lund
--
Jorgen Lundman | <[email protected]>
Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell)
Japan | +81 (0)3 -3375-1767 (home)