[Mitosis] random thoughts ...

Emmanuel Lecharny Sat, 17 Jan 2009 16:16:07 -0800

After having discussed with Alex yesterday about replication, I thoughta bit about what a replication system means, and I came to a point wherewe should not consider replication from a server to server perspective,but as a whole. Ok, know, it's a bit fuzzy. Let me explain what I havein mind.

First, let's consider that all the servers are connected and replicatecorrectly, without any kind of problem (ie, they never get disconnected,they are all time-synchronized, all operations have their uniquetimestamp). In this genuine case, we should consider that the full setof LDAP server should be seen as a unique LDAP server : everything isjust available from any server, without any difference.

If at least one server get disconnected, the you have split this virtualbig LDAP server in two parts : the disconnected server, and the rest ofthem. As they are still all connected, and perfectly synchronized, it'sreally like if we have one giant LDAP server again, so we are justfacing two LDAP servers, disconnected.

If we move a bit forward, if M servers get disconnected from a group ofN servers, then we fall back in the same situation : M is seen as aunique LDAP server, so is N.

One step further, if the set of servers is fragmented in many smalldisconnected sets, then each of those sets are seen a unique LDAP server.

Ok, so far so good. Where did it brought us ? I think that replicationper se is just a matter of managing replication between 2 servers, anyother case can fell back to this category.

Now, how do we manage replication between server A and server B(whatever the number of real servers present in A and B) ? Simple : aseach operation within A or B are done on a globally connected system,with each operation having its unique timestamp (ie, two operations havetwo different timestamps), all the modifications done globally areordered. It's just then a matter of re-ordering two lists of orderedoperations on A and B, and to apply them from the oldest operation tothe newest one. Let's see an example :

Server A and server B were synchronized at t0, when the connection wasbroken. Since then, many modification operations occured on both servers :

on Server A : op[1, t1], ..., op[i-1, ti-1], op[i, ti], op[i+1, ti+1],..., op[n-1, tn-1], op[n, tn]on Server B : op[1, t'1], ..., op[j-1, t'j-1], op[j, t'j], op[j+1,t'j+1], ..., op[m-1, t'm-1], op[m, t'm]

Server A and server B are now connected back to each other. eachmodifications done on B are to be applied on A and each operations doneon A must be applied on B. What if some of those operations areconflicting ? Let's just come back at t0, when both servers weresynchronized. If we consider that the servers remained synchronized allalong the connection breakage, then A and B would have received themodifications from each other at the very moment they occurred, and eachconflict would have result to an error being sent to the client.


Let's do as if the connection never broke then :

we restore the initial state of A and B to t0 (which is possible, as wehave the ChangeLog system, allowing us to revert to a previous state).Of course, we do so on both servers. Now, let's merge the modificationsform A and B :

op[A, 1, t1], op[B, 1, t'1]..., op[B,j-1, tj-1], op[B, j, tj], op[A,i-1, ti-1],op[B, j+1, tj+1], op[A, i, ti], op[A, i+1, ti+1], ..., op[A,n-1, tn-1], op[B, m-1, t'm-1], op[B, m, t'm], op[A, n, tn]

As the operation might have occurred at different times on both server,they have been mixed, but in any case, as each operation are supposed tohave a unique timestamp, the resulting list of modification is stillorder, on both servers.

Now, after having reverted to state t0, we just have to inject themodifications from the merged list on A and B, rejecting everymodification which are errors. At the end, A and B will be perfectlysynchronized, without conflicts.

Now, remember that A and B are not unique servers, but set of servers.It doesn't matter too much, as we can consider that all the servers inset A and set B are totally replicated, so they are in the very samestate, and the merged list can just be applied the same way to anyserver from A and B.

What if we have many group of disconnected servers ? This is a bit morecomplex, but not so much. We just have to replicate the groups 2 by 2,or assume they are replicated 2 by 2, and at the end of a potentiallylong process, where we revert back to the time the server wheredisconnected and reapply all the merged modifications, we will be backin the same state for all the servers.


There are only two conditions we must met :
1) the servers must be time synchronized,

2) the modifications timestamp must be unique, whatever server they havebeen done on.

Condition 2 can easily be met with the existing CSN, if we consider thatthere is on order in the replicas (ie A < B < C, ... where A, B, C arethe replica's id). This is purely conventional, but necessary.

Regarding condition #1, we can't guarantee that all the servers will usethe same time. We just do our best to get this as accurate as possible.

Last, not least : the triggers. If some modification can triggers someother (because of integrity constraints being activated), then it shouldbe logged in the change log. When replicating, the triggers _must_ bedisabled, as the merged operations will contain all the triggeredoperations.

Ok, I'm done now. All this is of course a coarse approximation, but Ithink it's pretty close to what we nned to deal with.

Please just tell me if I'm not totally off rail, or if you think I havejust did too much pot lately ;)


Thanks !

--
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org

[Mitosis] random thoughts ...

Reply via email to