Re: [ApacheDS][Mitosis] Replication data

Martin Alderson Wed, 05 Dec 2007 16:01:11 -0800


Thanks for the responses, all.

Apologies for the delay in getting back to you - having a family problemat the moment so have very little spare time.

I thought having the replication logs stored in LDAP sounded nice - fornew replicas we have to send all replicatable entries but after that thelog LDAP entries can be sent instead. It would be pretty much the samecode logic and it just seemed to solve all the problems with a largeamount of code re-use. I was worried about possible performance hitsthough and it sounds like you (Alex) don't want to store the logs inLDAP for the same reason.


My main reasons for suggesting storing the logs in LDAP are:

1. So we can have optional attributes in each log entry. This is neededwhen we "explode" the current message blob so it can be queriedefficiently. With JDBM I guess we would have to specify a new table foreach type of message.2. To reduce the code complexity. We would have virtually the same codefor sending whole entries as sending the logs and we would have lesscode for dealing with the data storage in general.3. To reduce the current tight coupling with the backend database. Byusing LDAP as the abstraction layer we could leverage ApacheDS' existingmechanism for specifying the data store.

4. To allow an easy way to view the logs.

5. It seems to be the most natural fit. Since we need to store (partof) an LDAP entry in the logs, why not store it in LDAP?

I'll take another stab at explaining that: we already have code to storeLDAP entries in a database, so why would we want to write that again?



> Oh this reminds me that we also need to make sure we're generating
> UUIDs all the time even if replication is not enabled.

Yeah, we have a JIRA about this:https://issues.apache.org/jira/browse/DIRSERVER-776



>> The biggest concern I have for this is the inflexibility of LDAP
>> searches. Do we have a sort control in ApacheDS?
> What types of searches do you envision performing, for which LDAP
> is too inflexible? OpenLDAP's syncrepl can be pretty much entirely
> mapped onto plain search operations. We gain a lot of versatility
> by keeping things generic.

We need to search for log entries beyond a certain CSN and have theresults ordered based on CSN. I guess if the results are alwaysreturned in creation date order then it might not be an issue (I'm notyet sure what ApacheDS does or what the LDAP standard says). Currentlywe also find the current CSN vector by just getting the most recent log- we do this by performing a search with inverted sort by CSN and 1result maximum. Also, if we have the attributes in a child entry of theactual log entry as I suggested we would need to specify a parent-childrelationship in the search.



> Active Directory has a lot of misfeatures... Having spent a couple
> weeks of "quality time" with it, the flaws just leap out... Do you
> really like the idea of carrying obsolete info around and needing a
> sweep task to go thru and clean up periodically?

My thoughts exactly. I'll try and do some more research here and digout the reason AD uses them, but I think I'll leave that to a separatethread.



> For example you have a delete of a node occur right when you add a
> child to it.  The server would probably put the child into some
> lost and found area and alert the administrator.  With tombstoning
> you can easily resuscitate the deleted parent and move the child
> back under it.

Resuscitating a deleted entry seems like something most people wouldn'twant. If we are attempting to simulate a single server as much aspossible (which is my main aim) then the new child entry should bedeleted when the peers synchronise. As you said, we could have anoptional lost and found area for cases where conflict resolution causesdata loss like this, along with optional notifications to an administrator.



>> Also our MMR support is still immature, we don't yet do value-level
>> conflict resolution.
> Yeash we have yet to consider that.

We will have this once I have fixedhttps://issues.apache.org/jira/browse/DIRSERVER-894.



> The trick to get from basic single-master to basic (entry-level
> only) multi-master is just to store multiple contextCSNs - one for
> each peer master, and ignore entry updates that are older than an
> entry's current entryCSN. The other requirement here is that you
> have reliable, tightly synced clocks, otherwise the conflict
> resolution policy falls apart.

That's exactly how our replication module works at the moment except wejust send the changes rather than the whole entry. I am currentlylooking at improving the way we store the logs so we can efficiently doattribute value level conflict resolution. I suspect that I will end upwith something very similar to delta-syncrepl. I will try and dig outsome information on that from the openldap mailing list.


Martin

Re: [ApacheDS][Mitosis] Replication data

Reply via email to