Re: [Dbmail-dev] multimaster replication and IMAP

Geo Carncross Fri, 22 Jul 2005 16:46:52 +0200 (CEST)

On Wed, 2005-07-20 at 00:23 -0400, Mordechai T. Abzug wrote:
> On Fri, Jul 15, 2005 at 02:02:27PM -0400, Geo Carncross wrote:
> 
> > Won't work. As much as it seems like this would be a good idea (and
> > believe me: about half a dozen people on this list have had it, so
> > it certainly is a good idea. better still, don't believe me, check
> > the archive yourself :) )
> 
> Thanks.  I've now gone through a bunch of archived messages looking to
> understand this.
> 
> Suggestion in short: why not keep track of which messages have been
> seen by each server, and if a server senses a potential issue
> (ie. after network problems are fixed, or whatever) correct it then?
> [More detail later in the message.]
> 
> As I understand it, the requirements are like so:
> 
> 
> (R0) Dropping email is not acceptable.


Agreed.

> (R1) Follow RFC: UIDs must be 32-bit values.

31-bit unsigned. Thunderbird (and others) get confused and start issuing
negative numbers if the high-bit is set.

> (R2) Follow RFC: UIDs must monotonically increase.  In particular, a
>      replication that results in messages with UIDs that precede a UID
>      value reported to a user is bad, and must be corrected.

Correction: It must never happen. The user will never download the
message.

> (R3) Each server in a replicating cluster should be read-write,
>      ie. multimaster updates should be allowed.

...

> (R4) Multimaster should work gracefully in failure scenario where
>      client and internet can reach each server, but servers cannot
>      reach each other.

It can, but you cannot accept _new_ messages safely (APPEND, smtp)

> Assumptions:
> 
> (A0) Duplicating email (ie. causing the user to see it a second time)
>      isn't so bad if it's infrequent.

Clarify please.

> (A1) Users start new IMAP session relatively infrequently -- a new
>      connection is not established within seconds of a previous
>      connection.

Incorrect. My Palm starts a new IMAP session every time it "wakes up" to
check the mailbox. So does my cell phone.

> (A2) Users should have an affinity to a particular server, and should
>      only switch/be switched to another server in the event of a
>      failure.

Incorrect. Many people want load-balancing. High-availability access to
reading emails. They must be willing to accept delays of new messages
when the cluster is damaged.

> (A3) The most likely mode of failure is that a server is unreachable
>      by a user.  This is the scenario that should be most engineered
>      to not duplicate mail.

....

> (A4) Loss of connectivity between two or more mail servers while
>      both/all servers are still visible to the same users is a rare
>      occurence.  Mail server should meet minimal requirements (ie. not
>      drop mail) but some duplicate mail in this case is acceptable.
>      This can be mitigated by giving users an affinity for a server.

Why bother duplicating email? I don't understand why you think that
duplicating email buys anything.

> Suggestion in detail:
> 
> 
> (S0) Each server in a multimaster cluster is assigned a unique
>      server_id.  If multimaster isn't necessary, server_id for all
>      servers is 0.
> 
> (S1) Locally generate UIDs in a way that is globally unique
>      (ie. splitting message sequence count using a local_sequence *
>      num_servers + server_id type scheme, or some similar method.)
>
> (S2) Each server keeps a replicated table "high_saved" of the last
>      locally-generated UID it's saved.  Index by mailbox and
>      server_id.
> 
> (S3) Each server keeps a replicated table "high_reported" of the last
>      UID it's reported to the client.  Index by mailbox and server_id.

Won't work. A client that performs the following operations will lose
email:

* Client (C) connects to host (A) sees uidvalidity mismatch; gets 1,2,4
* C connects to host (B) sees exists "4", tries to fetch uid 5, fails.

The problem is that "B" really has uids "1,2,3,4" but the client saw a
"gap" at "3" and will never download "3". RFC2060 _recommends_ clients
perform this optimization, and many clients do.

This violates your condition "R0".

UID numbers must be STRICTLY increasing for a mailbox. Nothing about
RFC2060 says "strictly increasing for a server". The consequence is that
one of the following needs to happen:

* break uidvalidity forcing clients to download all message headers
again

* force uid generation to be serialized.

None of these steps cause "duplicate emails", but violating RFC2060 in
the manner that [keeps] being brought up will cause mail to be lost.

My XID recommendation solves this problem by moving sequence generation
onto the client and only requires a few more bytes than UID numbers per
message, furthermore, clients don't have to keep a UID/ID map so it
actually uses less memory.


> (S4) Each server keeps a replicated table, "process_message_UID", that
>      is basically a message to each other server to make sure the UID
>      is OK.  Index by mailbox, remote server_id, UID.

...

> (S5) Each time an email arrives at a server (via SMTP or IMAP, not via
>      replication), the server generates a new UID using scheme from
>      part (S1) that is greater than any value currently in high_saved,
>      for any server.  Then, for each server_id other than itself, it
>      creates a row in process_message_queue.  Then, update high_saved
>      with the new UID.

How does it KNOW that the UID is going to be "greater" than any other
server? What happens if:

T1 Host "A" receives message
T1 Host "B" receives message
T2 Host "A" generates UID
T2 Host "B" generates UID
T3 Host "A" writes message to Store
T3 Host "B" writes message to Store
T4 Host "A" sees "B" message
T5 Host "B" sees "A" message

Since your servers can be disconnected "for a really long time" (e.g.
more than a few milliseconds) clients will lose email if both servers
receive messages during that time _OR_ if one of them receives messages
but the client connects to the other.

> (S6) Periodically, as a maintenance thread, each server checks
>      process_message_UIDs for any message sent to its server_id.
> 
>      LOOP foreach message: if the message UID is lower than the last
>      reported UID known for this server and mailbox, change the UID of
>      the email to a new UID as per (S1) and (S5), and delete message
>      from process_message_UIDs.  If greater than or equal, delete
>      message from process_message_UIDs without taking an action.

....

> (S7) When the user client connects and wants the last UID, first
>      perform step (S6).
> 
>      When done processing all messages in the process_message_UID
>      queue for that server_id, report new high UID to user and update
>      high_reported.

{ see other email }

> Examples/scenarios/analysis:
> 
> (E1) Single server, or multiple servers with a single master.  Step
>      (S1) degenerates into a simple sequence.  Step (S5) does the
>      same.  Steps (S6) and (S7) are basically skipped, since there are
>      no other servers to exchange messages with, so the loops are
>      empty.  So the server does no additional heavy lifting.

There must be a single master. My token-passing algorithm merely allows
each master to take turns. There does not exist a mechanism by which
IMAP can truly be turned into multi-master.

The best we can do is load-balancing, unless we can convince clients to
adopt my XID scheme.

> (E2) Multimaster load sharing, communication OK between servers: all
>      messages assigned UIDs uniquely.  In general, UIDs will increase,
>      but under some race conditions, a server will perceive a UID to
>      step back due to replication.  If no client actually asked about
>      UIDs during the race condition, no action is taken.  If a user
>      timed things "just right", so that message with ID N arrives on
>      server A and message with ID N+1 arrives on server B, and user
>      queries B, gets N+1, then email is replicated to B, and the
>      replication spreads the news that A has a message UID N for B.  B
>      should auto-sense the problem (the next time the user queries B,
>      or the next time B does its maintenance check) and B should
>      update the UID to something beyond the current known max.  So, if
>      the second client query is to B, the client will automatically
>      correct.  If the second client query is to A, the server will
>      initially give an old UID, but then B will correct it.  If the
>      configuration is such that clients prefer their last server or a
>      certain server, the second client query is more likely to go to
>      server B, which is better.
> 
>      Note #1: User will sometimes seems to have a duplicate email.
>      Since duplicate email is more acceptable than lost email, this
>      should be acceptable in most environments.
> 
>      Note #2: if clients restart sessions very often in this scenario,
>      it's possible to have thrashing.  But under normal conditions,
>      ie. where new sessions are relatively rare, this should be a
>      relatively rare occurrence.

...

> (E3) Multimaster, connection breaks, user can only reach one server:
>      until the connection breaks, communication is the same as in
>      scenario (E2).  Once the connections breaks, each server is
>      generating UIDs locally without being aware that the other server
>      is assigning them as well.  If the user can only connect to one
>      server (ie. user is at a WAN site, WAN site has local server "A",
>      WAN connection is down) the user can continue to send and receive
>      mail using the local server.  Server "A" will update
>      high_reported appropriately.  Remote server "B" may continue to
>      receive mail for the user, but high_reported will not be updated.
>      When connectivity is restored, so long as user continues (for the
>      short term) to use the same server, no duplicate email should
>      result.

But mail will be lost if mail can be received at "A" as well (think:
local mail)


> (E4) Multimaster, connection between servers breaks, user can reach
>      both: until the connection breaks, communication is the same as
>      in scenario (E2).  If the user communicates with both servers,
>      each server will independently increase reported UID.  When
>      connection is reestablished, one or both servers will reasign
>      UIDs to the other's email, resulting in apparently duplicate
>      email.  Gotta break some eggs.  Can be mitigated if user has
>      affinity for last mail server.

What if the connection BETWEEN servers breaks, but the client can still
access each (say they have a private dial-up connection)?

> (E5) When connectivity is broken between servers and then is
>      reestablished, there will be a while when the server is both
>      catching up and receiving new email and IMAP connections.  There
>      is potential here for duplicate email.  Can be mitigated if user
>      has affinity for last mail server.

No there is no potential for duplicate email; compare SHA1 hashes of
messages as part of replication. Better still: rely on the SQL server to
handle data-replication as I did for my token-passing algorithm.

> (E6) The process_message_UID table will be a choke point if you have a
>      lot of servers.  This scheme is good for high availability, bad
>      for scalability.
> 
> OK, I probably spent way too long thinking this through and working
> scenarios.  Did I miss anything?

Sorry, lots :)

If you really think that I missed something here, note that I'm not
looking for a fancy way to replicate the data- SQL servers can already
do that safely and without duplicating messages. You're only going to
confuse the issue by trying to deal with that as well.

The _hard_ part is serializing UID numbers. Because of the requirement
in RFC2060 of them being strictly increasing, any solution "that you
come up with" has to conform or the user _WILL_ lose email. If you think
you know a way (or can come up with a way) to solve this problem, you'll
have solved many billion-dollar problems at the same time. I'm under the
impression that it's unsolvable, and while I _think_ previous proofs on
the subject are correct, I'm willing to be shown wrong.


-- 
Internet Connection High Quality Web Hosting
http://www.internetconnection.net/

Re: [Dbmail-dev] multimaster replication and IMAP

Reply via email to