Tom Brown wrote: > Hi > > I need to design and then build a clustered setup that is scalable that > will distribute our MTA's across 2 of our datacentres. I have 4 boxes so > there will be 2 in each location, more should be able to be added later > if required. I will be configuring this in a master/standby config > rather than balancing the load between them. I am thinking about running > a linux ha cluster on these boxes and just treating all 4 as 2 sets of 2. > > I guess my questions are does exim play nicely in a linux ha type > situation and if not what other ways can be employed to maintain a ha > cluster of mta's ? > > thanks >
I cannot answer the Linux HA part. (FreeBSD here) But, w/r '...other ways.. Exim is *golden* - especially w/r managing ssl/tls certs and such. The concept of hot & standby is sound, with or without an actual cluster. When co-located and in the same IP block, it is dead-easy to manage two 'ordinary' boxen w/r failover & restoral, so a formal cluter has just not been an issue in our camp. - Syncing message store is the only real challenge, and that is not a show-stopper. - Conventional secondary and subsequent MX are not 100% predictable as to where inbound traffic may end up, and getting it to where it may be read by pop or imap w/o need for the users to alter MUA settings can increase complexity, raise box-count, and add latency. We have chose to publish just one mx. - it is faster and 'cleaner' to repoint BOTH smtp and pop/imap to the standby by means of IP-takeover than by DNS changes. No MUA changes required. - IMNSHO, maintenance of a 'prime' and 'secondary' is less work, espacially when each is really a 'prime' that can carry double, is in day-to-day service so you know it has not gone off - or out-of-date - while sitting on standby. Our approach: Each of two 'heavy' 2U servers (Tyan MB, dial Gig-E, Core-Duo CPU, 4 GB RAM, triple RAID1 arrays) have an 'always mine' frontside IP - primarily for ssh access. Each also has a 'public' IP which may be downed and taken over by the other box. This is where the DNS point each 'set' of domains. - These two (or more) IP are aliased onto the same 'external' NIC. - On another NIC, each has an 'internal' IP on a backside LAN. Primary use is data exchange & local storage, but it also serves to ssh from another box if/as/when both frontside IP are wanted offline. - *Normally* each server handles its own sepaate set of virtual domains, (per what the DNS points to) ergo nothing is really 'hot' or 'standby - just two lightly-loaded servers, each with more than enough reserve to do the entire job of both. The config's are identical, and both servers have each other's certs available. The virtual user DB (PostgreSQL in this case, but it need not be so) is also identical, i.e. - each server has all the data and storage structures it needs to do BOTH sets of domains and virtual users for smtp and IMAP. The 'master' DB is one a third, 1U Via C3 single-RAID1 box, which does not have to be on the same site (though ours are). Draws about 12W or less and lasts a long, long time. Changes to the user DB may be made here. If it is hors de combat, traffic is not significantly affected, other than as to new users or spam filter preferences. Manual changes may be made directly to the two main boxen DB if there is a long outage. Day-to-day syncing: For light loading, periodic rsync may be 'good enough' to keep bothway message stores reasonably current. NB: Our users ordinarily also have local sync'ed IMAP copies, so even if mailstore on the servers is not current, will still be able to refer to older messages. Shared external NAS with RAID storage can reduce that need, but becomes a single-point-of-failure. Failover: - alias the 'public' IP of the offline box to the survivor. Make sure the offline box does not come back on the net until you are ready for it. Recovery: - drop that alias when the offline box is ready to go back to work. It may be tested on the 'private' IP before the 'public' IP is turned back on. (we allow ip_literal for postmaster and such ...) All else is already in-place. Given the reliability of modern hardware and RAID, (a hard failure about once every 3 to 5 years), this seems to make better use of the resources before they go obsolete while on standby, and has allowed us to cut our server-count and UPS power budget roughly in half. One of the major driving factors was to not have any significant interruption in IMAP access, (we do not offer pop). IOW - transparency, and de facto redundancy, but with minimal 'idle' investment. I don't know if a Linux HA cluster would make this simpler - or more complex - that probably depends on the expertise of the implementor. But at least I can say that a cluster is not mandatory [1]. HTH, Bill [1] I've 'presumed' a Linux environment that need not be taken offline for application of upgrades/patches, i.e. - is normally run through a 45-60 second reboot only after a no more than quarterly or bi-annual 'make buildworld/kernel' cycle - or whatever the Linux equivalent is. Exim and other upgrades and patches are done without going offline, the listener daemon re-hupped in a few seconds. -- ## List details at http://www.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://www.exim.org/eximwiki/
