> In fact, when reconnecting, the replica should indicate what was the
> latest CSN it received, so the server can push back the modifications
> from this CSN up to the latest local CSN.

> There are two issues with this approach :
> - the deleted entries.
> - if the replica is connected to more than one other server, it will
> receive a hell lot of modifications from all the connected server at
> the same time.

I'm not sure why deleted entries are any different from other modifications 
here.  The delete modification will be sent to the connecting replica where it 
will be applied.

Having lots of replicas shouldn't be a problem when a downed replica comes back 
up.  The first replica to start replicating to the newly revived replica will 
acquire a lock - all other replicas will wait until the next replication cycle. 
 This would also become much less of a problem if we don't need a replica to be 
connected to all other replicas.


> Right now, from the code I can read, the deleted enties are
> "tombstoned". Maybe we can get rid of this, as we also store the
> delete operation into the derby Store at this point.

Yeah, I should have phrased it as "We _have_ tombstone entries but don't _use_ 
them".


> I rethougt about this and the problem is that we won't be able to
> resync a server disconnected for a too long period, unless we simply
> erase its full base and ask for all the entries. can be costly when
> you have millions of entries ! However, in this very case (let's say
> you keep a one week period modifications), if you get out of this
> window, the best would probably to restore the base from a backup (way
> faster than reinjecting all the entries one by one !), and then resync
> using the modification log.
> 
> So the modification log should only contain a limited number of
> modifications, depending on the configured storage period.
> 
> In order to get this working, we have to implement a decent DRS too
> (Disaster Recovery System), which is on its way, as it's just a
> specific implementation of the current changelog interceptor (we have
> to store on disk the modifications, but not the reverse modifications,
> as it's done with the ChangeLog mechanism).

Exactly.  When a replica comes back up after more than a week's downtime then 
it would have to be treated as a new replica, with its DIT replaced.

I think Alex mentioned a while back that it would be good to merge the 
changelog interceptor with the replication system - it seems a waste to have 
two pieces of code both maintaining modification logs.  I'm not sure how close 
they are though.

It would also be good to have automatic backups as just another replica with 
certain options (read only, sync'd periodically).


Martin


Reply via email to