We have three servers. One server generates a lot of mail and uses a pair of servers as a smart host. The two servers are addressed under the same name (mx.mythic-beasts.com), so the config on the sending server looks like this:
smarthost: driver = manualroute route_data = mx.mythic-beasts.com transport = remote_smtp where: $ host mx.mythic-beasts.com mx.mythic-beasts.com has address 93.93.131.52 mx.mythic-beasts.com has address 93.93.130.6 Every now and again, exim on the sending server decides that it can't send mail, and starts queuing mail. Looking at the logs, it appears to be triggered by a connection time out: 2009-09-29 20:26:59 1MsiIf-0002cW-Ps == [email protected] R=smarthost T=remote_smtp defer (110): Connection timed out and that will then be followed by lots of non-retries: 2009-09-29 20:26:59 1MsiLb-0003f7-DW == [email protected] R=smarthost T=remote_smtp defer (-53): retry time not reached for any host Exim then appears to refuse to retry for an unreasonably long period of time. For example, exim successfully sends a mail at 20:54. It then receives a number of time outs up to 20:58. Then, it does not appear to retry until 04:57 the following morning, despite logging a "defer (-53): retry time not reached for any host" many times every minute for the whole of that period. Our retry configuration says: begin retry # Only retry bounce delivery once every 12 hours, for 4 days. * * senders=: F,4d,12h # Everything else, try once every 15 minutes for 12 hours, then once an hour, # increasing by 150% each time, for 16 hours; then once every 8 hours for 4 # days. * * F,12h,15m; G,16h,1h,1.5; F,4d,8h A couple of questions: 1. Why doesn't it retry during that 8 hour period? Surely the successful send at 20:54 should reset the retry rules? 2. Does setting route_data to an A record with multiple IPs achieve the redundancy I'm looking for? As far as I can tell, exim makes no attempt to fall back on the second IP after the connection failure: it hadn't seen a connection failure on the other IP for around 3 hours prior to going into "won't send any mail" mode. I'm separately trying to get to the bottom of why we're seeing the connection refusal in the first place, but I'd like to understand why our setup isn't as robust as I think it should be. many thanks, Paul -- ## List details at http://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
