We do it with three physical machines across two physical sites:

    relay1.thorcom.net    Worcester
    relay2.thorcom.net    Worcester
    relay3.thorcom.net    Uxbridge

All three machines run the same config.

Domains that we allow to relay have their MX so that relay1 and relay2 are load balanced and relay 3 is the fall back, simply by doing:

    IN    MX    10    relay1.thorcom.net.
    IN    MX    10    relay2.thorcom.net.
    IN    MX    50    relay3.thorcom.net.

in the DNS zone file(s).

When a sending MTA or MUA looks up the MX for the domain it gets two equal choices (relay1 and relay2) and randomly picks one - half the traffic comes in on each of relay1 and relay2.

If one of relay1 or relay2 were to fail or be taken offline for upgrade all the traffic goes via the remaining one.

If both relay1 and relay2 go off line (ISP comms loss; power loss) then relay3 is still there as fall back.

This approach needs no load balancers and has not skipped a beat in 15+ years.


Mike


On 8/16/2016 12:24 PM, Mike Brudenell wrote:
Hi, Peter -

If you're asking what I think you are, we operate a similar setup.

As you say, Round-Robin isn't a great solution as there can be lengthy
delays before the client gives up on an IP address and tried the other.
Also some clients might only try the first of the set of IPs and not bother
trying others if it can't connect.

We use two VMs running Exim under Ubuntu, although you can have as many as
you want. These act as our mail relays, accepting incoming messages and
routing them onward to the delivery hosts/wherever.

These have a load balancer front end: a pair of servers running the (free)
Linux Virtual Server (LVS) load balancer software under Ubuntu. These poll
the Exim services on ports 25, 465 and 587 and route connections to both
servers using a Least Connections policy.

If one of the backend servers goes down the LVS quickly notices — we have
it set to do its health check poll every 5–10 seconds: I can't remember
which — and routes incoming traffic to the remaining server.

We publish the DNS name and IP address of the front end load balancer and
clients connect to this, which then routes the connection onward.

Our backend Eixm servers are separate, standalone systems; they *do not* share
a filestore with a single queue, but instead each have their own. This
means that if one server goes down then any messages held in its queue are
delayed until it's brought back up. However because we're using VMs instead
of physical hardware this can be done quickly within our VMware estate. But
if you are still running physical hardware then it's something you need to
be aware of as it can take a while to get spare parts delivered/fitted.

Cheers,
Mike B-)


On 16 August 2016 at 12:07, Peter Leeman <[email protected]> wrote:

We have an Exim server that only acts as a relay server and does not
hold/manage recipient mailboxes.  Can anyone suggest a (free but good)
method for implementing failover for this box.

There needs to be a level of intelligence to determine if the server is up
or down as a lot of the emails that go through this relay are generated by
scripts or multifunction printers that don't hold emails for resending.  I
was thinking about DNS round-robin but this would not be suitable.  Also
some of the applications that generate mail have to use IP addresses and
will not accept DNS names.

Regards,

Peter

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/





--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/

Reply via email to