On 2009/10/02 13:00 +0200, Paul [[email protected]] wrote:

Hi,

> 1. Why doesn't it retry during that 8 hour period?  Surely the 
> successful send at 20:54 should reset the retry rules?
To be honest, I'm fairly new to Exim so I can't promise to be of much 
assistance. Having said that, check out chapter 32 of the Exim4 spec 
available at 
http://www.exim.org/exim-html-current/doc/html/spec_html/ch32.html . It 
describes the retry function in detail.

The bit I noticed near the top was

"Exim's retry processing in this case is applied on a per-host 
(strictly, per IP address) basis, not on a per-message basis. Thus, if 
one message has recently been delayed, delivery of a new message to the 
same host is not immediately tried, but waits for the host's retry time 
to arrive. If the retry_defer log selector is set, the message "retry 
time not reached" is written to the main log whenever a delivery is 
skipped for this reason. "

Based on the log extracts provided, the retry configuration and your 
description of the time-line of events I would guess that the initial 
failed message ( due to timeout ) was to an IP which remains unavailable 
for an extended period of time. The successful delivery is to a 
different ( reachable ) IP address, but does not affect the retry values 
for the original failed message. The "retry not reached" messages are 
for _new_ messages ( you did say it was a high volume server ) delivered 
into the queue, for which routing lookups returned the "failed" IP address.

The above is merely a guess. The log snippets you provided seem to me to 
be somewhat obfuscated. Please provide exact log extracts, along with 
the output of exim -bP ( which shows the runtime configuration values 
for Exim ). Further reading required.

However, this may all be moot because...
> 2. Does setting route_data to an A record with multiple IPs achieve 
> the redundancy I'm looking for?  As far as I can tell, exim makes no 
> attempt to fall back on the second IP after the connection failure: it 
> hadn't seen a connection failure on the other IP for around 3 hours 
> prior to going into "won't send any mail" mode.
Short answer; I'm not sure.

--snip--
mail1:/usr/share/doc/exim4# host -t MX mythic-beasts.com
mythic-beasts.com mail is handled by 10 mx1.mythic-beasts.com.
mythic-beasts.com mail is handled by 10 mx2.mythic-beasts.com.
--snip--

The above indicates MX hosts with identical priority, yet different 
host-names and IP addresses. If it's just redundancy you want, I would 
ask why you don't simply have a primary and secondary MX with differing 
priority values.

Also, why are you using just the one host-name in the router 
configuration instead of adding both host-names to the route_data value 
? See section 20.1 at 
http://www.exim.org/exim-html-current/doc/html/spec_html/index.html#toc0194

Not falling back on the "other" IP seems either like an artifact of some 
kind of look-up caching or a result of using the manualroute router 
without a route_list. Again, this is a guess.
> I'm separately trying to get to the bottom of why we're seeing the 
> connection refusal in the first place, but I'd like to understand why 
> our setup isn't as robust as I think it should be.
>
> many thanks,
>
> Paul
All of the above may or may not be of use - I am certainly no Exim 
expert. My only hope is that it doesn't lead you down a rabbit-hole ;-)

Ciao,
Sven
-- 
## List details at http://lists.exim.org/mailman/listinfo/exim-users 
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/

Reply via email to