On 2015-03-13 23:50, Bron Gondwana wrote:
So I've been doing a lot of thinking about Cyrus clustering, with the
underlying question being "what would it take to make FastMail run a
murder".  We've written a fair bit about our infrastructure - we use
nginx as a frontend proxy to direct traffic to backend servers, and have
no interdependencies between the backends, so that we can scale
indefinitely.  With murder as it exists now, we would be pushing the
limits of the system already - particularly with the globally
distributed datacentres.

Why would FastMail consider running murder, given our existing
nice system?

a) we support folder sharing within businesses, so at the moment we are
   limited by the size of a single slot.  Some businesses already push
   that limit.


How, though, do you "ensure" that a mailbox for a new user in such business is created on the same backend as all the other users of said business?

Here are our deal-breaker requirements:

1) unified murder - we don't want to run both a frontend AND a backend
   imapd process  for every single connection.  We already have nginx,
   which is non-blocking, for the initial connection and auth handling.


There's one particular "problem" with using NGINX as the IMAP proxy -- it requires that external service that responds with the address to proxy to.

I say "problem" in quotes to emphasize I use the term "problem" very loosely -- whether it be a functioning backend+mupdate+frontend or a functioning backend+mupdate+frontend+nginx+service is a rather futile distinction, relatively speaking.

2) no table scans - anything that requires a parse and ACL lookup for
   every single row of mailboxes.db is going to be a non- starter when
   you multiply the existing mailboxes.db size by hundreds.


I don't understand how this is an established problem already -- or not as much as I probably should. If 72k users can be happy on a murder topology, surely 4 times as many could also be happen -- inefficiencies notwithstanding, they're "only" a vertical scaling limitation.

That said of course I understand it has it's upper limit, but getting updated lookup tables in-memory pushed there when an update happens would seem to resolve the problem, no?

3) no single-point-of-failure - having one mupdate master which can stop
   the entire cluster working if it's offline, no thanks.


This is not necessarily what a failed mupdate server does though -- new folders and folder renames (includes deletions!) and folder transfers won't work, but the cluster remains functional under both the SMTP-to-backend and LMTP-proxy-via-frontend topology -- autocreate for Sieve fileinto notwithstanding, and mailbox hierarchies distributed over multiple backends when also using the SMTP-to-backend topoplogy notwithstanding.

Thankfully, the state of the art in distributed databases has moved a
long way since mupdate was written.

I have also written a one-or-two line patch that enables backends that replicate, to both be a part of the same murder topology, to prevent the replica "slave" from bailing out on the initial creation of a mailbox -- consulting mupdate and finding that it would already exist.

Along with this, we need a reverse lookup for ACLs, so that any one user doesn't ever need to scan the entire mailboxes.db. This might be hooked
into the distributed DB as well, or calculated locally on each node.


I reckon this may be the "rebuild more efficient lookup trees in-memory or otherwise" I may have referred to just now just not in so many words.

And that's pretty much it.  There are some interesting factors around
replication, and I suspect the answer here is to have either multi-
value support or embed the backend name into the mailboxes.db key
(postfix) such that you wind up listing the same mailbox multiple
times.

In a scenario where only one backend is considered "active" for the given (set of) mailbox(es), and the other is "passive", this has been more of a one-line patch in mupdate plus the proper infrastructure in DNS/keepalived type of failover service IP addresses than it has been about allowing duplicates and suppressing them.

Kind regards,

Jeroen van Meeuwen

--
Systems Architect, Kolab Systems AG

e: vanmeeuwen at kolabsys.com
m: +41 79 951 9003
w: https://kolabsystems.com

pgp: 9342 BF08

Reply via email to