On Fri, Jul 15, 2005 at 02:02:27PM -0400, Geo Carncross wrote: > Won't work. As much as it seems like this would be a good idea (and > believe me: about half a dozen people on this list have had it, so > it certainly is a good idea. better still, don't believe me, check > the archive yourself :) )
Thanks. I've now gone through a bunch of archived messages looking to understand this. Suggestion in short: why not keep track of which messages have been seen by each server, and if a server senses a potential issue (ie. after network problems are fixed, or whatever) correct it then? [More detail later in the message.] As I understand it, the requirements are like so: (R0) Dropping email is not acceptable. (R1) Follow RFC: UIDs must be 32-bit values. (R2) Follow RFC: UIDs must monotonically increase. In particular, a replication that results in messages with UIDs that precede a UID value reported to a user is bad, and must be corrected. (R3) Each server in a replicating cluster should be read-write, ie. multimaster updates should be allowed. (R4) Multimaster should work gracefully in failure scenario where client and internet can reach each server, but servers cannot reach each other. Assumptions: (A0) Duplicating email (ie. causing the user to see it a second time) isn't so bad if it's infrequent. (A1) Users start new IMAP session relatively infrequently -- a new connection is not established within seconds of a previous connection. (A2) Users should have an affinity to a particular server, and should only switch/be switched to another server in the event of a failure. (A3) The most likely mode of failure is that a server is unreachable by a user. This is the scenario that should be most engineered to not duplicate mail. (A4) Loss of connectivity between two or more mail servers while both/all servers are still visible to the same users is a rare occurence. Mail server should meet minimal requirements (ie. not drop mail) but some duplicate mail in this case is acceptable. This can be mitigated by giving users an affinity for a server. Suggestion in detail: (S0) Each server in a multimaster cluster is assigned a unique server_id. If multimaster isn't necessary, server_id for all servers is 0. (S1) Locally generate UIDs in a way that is globally unique (ie. splitting message sequence count using a local_sequence * num_servers + server_id type scheme, or some similar method.) (S2) Each server keeps a replicated table "high_saved" of the last locally-generated UID it's saved. Index by mailbox and server_id. (S3) Each server keeps a replicated table "high_reported" of the last UID it's reported to the client. Index by mailbox and server_id. (S4) Each server keeps a replicated table, "process_message_UID", that is basically a message to each other server to make sure the UID is OK. Index by mailbox, remote server_id, UID. (S5) Each time an email arrives at a server (via SMTP or IMAP, not via replication), the server generates a new UID using scheme from part (S1) that is greater than any value currently in high_saved, for any server. Then, for each server_id other than itself, it creates a row in process_message_queue. Then, update high_saved with the new UID. (S6) Periodically, as a maintenance thread, each server checks process_message_UIDs for any message sent to its server_id. LOOP foreach message: if the message UID is lower than the last reported UID known for this server and mailbox, change the UID of the email to a new UID as per (S1) and (S5), and delete message from process_message_UIDs. If greater than or equal, delete message from process_message_UIDs without taking an action. (S7) When the user client connects and wants the last UID, first perform step (S6). When done processing all messages in the process_message_UID queue for that server_id, report new high UID to user and update high_reported. Examples/scenarios/analysis: (E1) Single server, or multiple servers with a single master. Step (S1) degenerates into a simple sequence. Step (S5) does the same. Steps (S6) and (S7) are basically skipped, since there are no other servers to exchange messages with, so the loops are empty. So the server does no additional heavy lifting. (E2) Multimaster load sharing, communication OK between servers: all messages assigned UIDs uniquely. In general, UIDs will increase, but under some race conditions, a server will perceive a UID to step back due to replication. If no client actually asked about UIDs during the race condition, no action is taken. If a user timed things "just right", so that message with ID N arrives on server A and message with ID N+1 arrives on server B, and user queries B, gets N+1, then email is replicated to B, and the replication spreads the news that A has a message UID N for B. B should auto-sense the problem (the next time the user queries B, or the next time B does its maintenance check) and B should update the UID to something beyond the current known max. So, if the second client query is to B, the client will automatically correct. If the second client query is to A, the server will initially give an old UID, but then B will correct it. If the configuration is such that clients prefer their last server or a certain server, the second client query is more likely to go to server B, which is better. Note #1: User will sometimes seems to have a duplicate email. Since duplicate email is more acceptable than lost email, this should be acceptable in most environments. Note #2: if clients restart sessions very often in this scenario, it's possible to have thrashing. But under normal conditions, ie. where new sessions are relatively rare, this should be a relatively rare occurrence. (E3) Multimaster, connection breaks, user can only reach one server: until the connection breaks, communication is the same as in scenario (E2). Once the connections breaks, each server is generating UIDs locally without being aware that the other server is assigning them as well. If the user can only connect to one server (ie. user is at a WAN site, WAN site has local server "A", WAN connection is down) the user can continue to send and receive mail using the local server. Server "A" will update high_reported appropriately. Remote server "B" may continue to receive mail for the user, but high_reported will not be updated. When connectivity is restored, so long as user continues (for the short term) to use the same server, no duplicate email should result. (E4) Multimaster, connection between servers breaks, user can reach both: until the connection breaks, communication is the same as in scenario (E2). If the user communicates with both servers, each server will independently increase reported UID. When connection is reestablished, one or both servers will reasign UIDs to the other's email, resulting in apparently duplicate email. Gotta break some eggs. Can be mitigated if user has affinity for last mail server. (E5) When connectivity is broken between servers and then is reestablished, there will be a while when the server is both catching up and receiving new email and IMAP connections. There is potential here for duplicate email. Can be mitigated if user has affinity for last mail server. (E6) The process_message_UID table will be a choke point if you have a lot of servers. This scheme is good for high availability, bad for scalability. OK, I probably spent way too long thinking this through and working scenarios. Did I miss anything? - Morty