>From my phone, so excuse brevity and top-posting, but Fastmail running murder >would be a huge bonus. I not-so-fondly recall the intimate relationship I >developed with gdb debugging murder issues when we upgraded from 2.3 to 2.4 :)
Sent via the Samsung GALAXY S® 5, an AT&T 4G LTE smartphone -------- Original message -------- From: Bron Gondwana <br...@fastmail.fm> Date:03/13/2015 6:50 PM (GMT-05:00) To: Cyrus Devel <cyrus-devel@lists.andrew.cmu.edu> Cc: Subject: What would it take for FastMail to run murder So I've been doing a lot of thinking about Cyrus clustering, with the underlying question being "what would it take to make FastMail run a murder". We've written a fair bit about our infrastructure - we use nginx as a frontend proxy to direct traffic to backend servers, and have no interdependencies between the backends, so that we can scale indefinitely. With murder as it exists now, we would be pushing the limits of the system already - particularly with the globally distributed datacentres. Why would FastMail consider running murder, given our existing nice system? a) we support folder sharing within businesses, so at the moment we are limited by the size of a single slot. Some businesses already push that limit. b) it's good to dogfood the server we put so much work into. Here are our deal-breaker requirements: 1) unified murder - we don't want to run both a frontend AND a backend imapd process for every single connection. We already have nginx, which is non-blocking, for the initial connection and auth handling. 2) no table scans - anything that requires a parse and ACL lookup for every single row of mailboxes.db is going to be a non- starter when you multiply the existing mailboxes.db size by hundreds. 3) no single-point-of-failure - having one mupdate master which can stop the entire cluster working if it's offline, no thanks. Thankfully, the state of the art in distributed databases has moved a long way since mupdate was written. We'd have to at least change the mupdate protocol anyway to handle newly added fields, so why not just do away with it and have every server run a local node of a distributed database protocol for its mailboxes.db. Along with this, we need a reverse lookup for ACLs, so that any one user doesn't ever need to scan the entire mailboxes.db. This might be hooked into the distributed DB as well, or calculated locally on each node. And that's pretty much it. There are some interesting factors around replication, and I suspect the answer here is to have either multi- value support or embed the backend name into the mailboxes.db key (postfix) such that you wind up listing the same mailbox multiple times. We already suppress duplicates in the LIST command, so all we need then is logic for choosing the actual master. Rob N has done some work with consul and etcd already at FastMail, and we would use either that or a flag in the distributed DB to drive master choice for backend connection purposes. There are a bunch of "nice to have"s on top of this, but I think this would be enough for us to convert our existing standalone servers over to a murder. Bron. -- Bron Gondwana br...@fastmail.fm