On 2009/10/02 13:00 +0200, Paul [[email protected]] wrote: Hi,
> 1. Why doesn't it retry during that 8 hour period? Surely the > successful send at 20:54 should reset the retry rules? To be honest, I'm fairly new to Exim so I can't promise to be of much assistance. Having said that, check out chapter 32 of the Exim4 spec available at http://www.exim.org/exim-html-current/doc/html/spec_html/ch32.html . It describes the retry function in detail. The bit I noticed near the top was "Exim's retry processing in this case is applied on a per-host (strictly, per IP address) basis, not on a per-message basis. Thus, if one message has recently been delayed, delivery of a new message to the same host is not immediately tried, but waits for the host's retry time to arrive. If the retry_defer log selector is set, the message "retry time not reached" is written to the main log whenever a delivery is skipped for this reason. " Based on the log extracts provided, the retry configuration and your description of the time-line of events I would guess that the initial failed message ( due to timeout ) was to an IP which remains unavailable for an extended period of time. The successful delivery is to a different ( reachable ) IP address, but does not affect the retry values for the original failed message. The "retry not reached" messages are for _new_ messages ( you did say it was a high volume server ) delivered into the queue, for which routing lookups returned the "failed" IP address. The above is merely a guess. The log snippets you provided seem to me to be somewhat obfuscated. Please provide exact log extracts, along with the output of exim -bP ( which shows the runtime configuration values for Exim ). Further reading required. However, this may all be moot because... > 2. Does setting route_data to an A record with multiple IPs achieve > the redundancy I'm looking for? As far as I can tell, exim makes no > attempt to fall back on the second IP after the connection failure: it > hadn't seen a connection failure on the other IP for around 3 hours > prior to going into "won't send any mail" mode. Short answer; I'm not sure. --snip-- mail1:/usr/share/doc/exim4# host -t MX mythic-beasts.com mythic-beasts.com mail is handled by 10 mx1.mythic-beasts.com. mythic-beasts.com mail is handled by 10 mx2.mythic-beasts.com. --snip-- The above indicates MX hosts with identical priority, yet different host-names and IP addresses. If it's just redundancy you want, I would ask why you don't simply have a primary and secondary MX with differing priority values. Also, why are you using just the one host-name in the router configuration instead of adding both host-names to the route_data value ? See section 20.1 at http://www.exim.org/exim-html-current/doc/html/spec_html/index.html#toc0194 Not falling back on the "other" IP seems either like an artifact of some kind of look-up caching or a result of using the manualroute router without a route_list. Again, this is a guess. > I'm separately trying to get to the bottom of why we're seeing the > connection refusal in the first place, but I'd like to understand why > our setup isn't as robust as I think it should be. > > many thanks, > > Paul All of the above may or may not be of use - I am certainly no Exim expert. My only hope is that it doesn't lead you down a rabbit-hole ;-) Ciao, Sven -- ## List details at http://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
