On 2015-03-13 23:50, Bron Gondwana wrote:
So I've been doing a lot of thinking about Cyrus clustering, with the
underlying question being "what would it take to make FastMail run a
murder". We've written a fair bit about our infrastructure - we use
nginx as a frontend proxy to direct traffic to backend servers, and
have
no interdependencies between the backends, so that we can scale
indefinitely. With murder as it exists now, we would be pushing the
limits of the system already - particularly with the globally
distributed datacentres.
Why would FastMail consider running murder, given our existing
nice system?
a) we support folder sharing within businesses, so at the moment we are
limited by the size of a single slot. Some businesses already push
that limit.
How, though, do you "ensure" that a mailbox for a new user in such
business is created on the same backend as all the other users of said
business?
Here are our deal-breaker requirements:
1) unified murder - we don't want to run both a frontend AND a backend
imapd process for every single connection. We already have nginx,
which is non-blocking, for the initial connection and auth handling.
There's one particular "problem" with using NGINX as the IMAP proxy --
it requires that external service that responds with the address to
proxy to.
I say "problem" in quotes to emphasize I use the term "problem" very
loosely -- whether it be a functioning backend+mupdate+frontend or a
functioning backend+mupdate+frontend+nginx+service is a rather futile
distinction, relatively speaking.
2) no table scans - anything that requires a parse and ACL lookup for
every single row of mailboxes.db is going to be a non- starter when
you multiply the existing mailboxes.db size by hundreds.
I don't understand how this is an established problem already -- or not
as much as I probably should. If 72k users can be happy on a murder
topology, surely 4 times as many could also be happen -- inefficiencies
notwithstanding, they're "only" a vertical scaling limitation.
That said of course I understand it has it's upper limit, but getting
updated lookup tables in-memory pushed there when an update happens
would seem to resolve the problem, no?
3) no single-point-of-failure - having one mupdate master which can
stop
the entire cluster working if it's offline, no thanks.
This is not necessarily what a failed mupdate server does though -- new
folders and folder renames (includes deletions!) and folder transfers
won't work, but the cluster remains functional under both the
SMTP-to-backend and LMTP-proxy-via-frontend topology -- autocreate for
Sieve fileinto notwithstanding, and mailbox hierarchies distributed over
multiple backends when also using the SMTP-to-backend topoplogy
notwithstanding.
Thankfully, the state of the art in distributed databases has moved a
long way since mupdate was written.
I have also written a one-or-two line patch that enables backends that
replicate, to both be a part of the same murder topology, to prevent the
replica "slave" from bailing out on the initial creation of a mailbox --
consulting mupdate and finding that it would already exist.
Along with this, we need a reverse lookup for ACLs, so that any one
user
doesn't ever need to scan the entire mailboxes.db. This might be
hooked
into the distributed DB as well, or calculated locally on each node.
I reckon this may be the "rebuild more efficient lookup trees in-memory
or otherwise" I may have referred to just now just not in so many words.
And that's pretty much it. There are some interesting factors around
replication, and I suspect the answer here is to have either multi-
value support or embed the backend name into the mailboxes.db key
(postfix) such that you wind up listing the same mailbox multiple
times.
In a scenario where only one backend is considered "active" for the
given (set of) mailbox(es), and the other is "passive", this has been
more of a one-line patch in mupdate plus the proper infrastructure in
DNS/keepalived type of failover service IP addresses than it has been
about allowing duplicates and suppressing them.
Kind regards,
Jeroen van Meeuwen
--
Systems Architect, Kolab Systems AG
e: vanmeeuwen at kolabsys.com
m: +41 79 951 9003
w: https://kolabsystems.com
pgp: 9342 BF08