On Thu, Jan 15, 2009 at 12:55 PM, Martin Alderson <[email protected]> wrote: >> the current implementation of replication, AFAICT, is based on a pull >> system. > > The current system seems more like push than pull to me, in the sense that > one replica decides to send modifications to another without being asked for > them.
perfectly right. > > The process of A (client) replicating modifications to B (server) at the > moment is: > > 1. A connects to B. > 2. Periodically, A asks B for its current CSN vector. > 3a. If A still has enough modifications in its log to bring B up to date: A > sends B all the modifications that it has newer than B's CSN vector. > 3b. If A doesn't have enough modifications in its log to bring B up to date: > A sends B all entries in the DIT. > 4. B applies the incoming modifications. > > > There is a problem that you mention, where B goes down and A just keeps > trying to reconnect blindly. It would be nice if the responsibility lay with > B to notify all other replicas when it comes back up. yep. Otherwise, we use the current retry mechanism : a thread tries to connect to the replica, if it fails, it waits 2 secs, then 4, 8 16, 32 and 60 secs, until it gets a response back. > > >> - how many threads should we have ? A pool or one thread per replicas ? > > I would say one thread per replica is OK (I think that's what we have now). > Ideally we would have a system where we don't need to be connected to every > replica (i.e. if A is connected to B and B is connected to C we don't need A > to be connected to C). > > >> The connecting replica could send the last entryCSN received, and then a >> search can be done on the server for every CSN with a higher CSN. Then >> you don't need to keep any modification on disk, as they are already >> stored in the DiT. > > This would only be OK if you could guarantee that no new modifications would > occur on the reconnecting replica until it has been brought back up to date. > I think the current method of just sending the modification logs works better > especially when the replica was only disconnected for something like a > temporary network glitch. In fact, when reconnecting, the replica should indicate what was the latest CSN it received, so the server can push back the modifications from this CSN up to the latest local CSN. There are two issues with this approach : - the deleted entries. - if the replica is connected to more than one other server, it will receive a hell lot of modifications from all the connected server at the same time. > > I don't think there's a reason to have a separate list of modification logs > for each replica stored in the server - we can just keep the main > modification logs around until all replicas have them. Right. I have overlooked this, I think. >> The way Mitosis works atm is to keep the deleted entries in the DiT with >> a added attribute telling if the entry has been deleted, so we keep them >> in the DiT ( but not available for standard operations) until all the >> replicas has been updated. So a disconnected replicas which reconnect >> will get the deleted entry info when it connect back. > > As far as I can see mitosis doesn't really use these tombstone entries. A > delete operation is stored in the modification logs which are sent to any > replica that doesn't have them yet. Right now, from the code I can read, the deleted enties are "tombstoned". Maybe we can get rid of this, as we also store the delete operation into the derby Store at this point. > >> How to handle the real deletion is the problem, as we have to keep a >> state of each replica... > > Personally I think we can stop using tombstone entries completely and just > rely on the modification logs. The question then is when do we purge the > modification logs? At the moment a modification log item is purged after a > certain (configurable) amount of time. It perhaps would be nicer if we kept > modification log items around while we know other replicas still need them. > This would just involve storing the CSN vector for each known replica. I rethougt about this and the problem is that we won't be able to resync a server disconnected for a too long period, unless we simply erase its full base and ask for all the entries. can be costly when you have millions of entries ! However, in this very case (let's say you keep a one week period modifications), if you get out of this window, the best would probably to restore the base from a backup (way faster than reinjecting all the entries one by one !), and then resync using the modification log. So the modification log should only contain a limited number of modifications, depending on the configured storage period. In order to get this working, we have to implement a decent DRS too (Disaster Recovery System), which is on its way, as it's just a specific implementation of the current changelog interceptor (we have to store on disk the modifications, but not the reverse modifications, as it's done with the ChangeLog mechanism). PS: I will try to sumarize all those ideas on the wiki page later. Sadly, I'm pretty busy atm, having to sweat for a client :/ Thanks ! -- Regards, Cordialement, Emmanuel Lécharny www.iktek.com
