[ApacheDS][Mitosis] Replication data

Martin Alderson Thu, 22 Nov 2007 15:27:31 -0800

Hi all,

I am currently looking into some of the replication issues, specificallyDIRSERVER-894 ("Older concurrent changes are never replicated"),DIRSERVER-1097 ("Only send net changes during replication") andDIRSERVER-1101 ("New replicas may never receive some recent modifications").

I think these issues will require changing the replication data format.Currently the replication logs are stored in a single database tablewith time, replica ID, sequence number and operation columns. The first3 comprise the CSN and the last is for a serialised operation object.

DIRSERVER-894 needs a way to work out the CSN at the point a specificattribute was last modified. DIRSERVER-1097 needs a way to findprevious log entries based on entryUUID, modification type and attributeID. We are also planning on moving the replication data to the DIT.Given all this I am thinking of removing the serialised operation bloband replacing it with extra table(s) for each operation type storing theoperation's data across multiple columns. This will allow us toefficiently query the replication logs based on the operation data.

Perhaps this would be a good time to make the jump to storing thereplication data in the DIT. It seems that that would be well suited tostoring the operations in an "exploded" format. I am thinking of thefollowing kind of format:


ou=logs/
  cn=<csn>/
      objectClass: ... (indicates operation type)
      time: ...
      replicaID: ...
      operationSequence: ...
      entryUUID: ...
      attributeID: <attributeName> (for attribute modifications)
      cn=attributes/
        <attributeName>: <attributeValues>

The biggest concern I have for this is the inflexibility of LDAPsearches. Do we have a sort control in ApacheDS? Also, if we have theattributes for the operation in a child entry how can we find anoperation in the logs based on those attributes.

At the same time I am thinking about a couple of things in thereplication system that don't seem to be necessary.

Firstly, once DIRSERVER-894 is fixed, I don't think we will need theentryCSN attribute. I believe that it is only used to check whether anoperation should be applied to an entry or not (i.e. is it a newmodification), but this is broken and we need to check the CSN perattribute by using the logs instead.

Secondly, I don't really see the point of "tombstoning" entries (markingthem as deleted instead of really deleting them). The only time I cansee it having any kind of effect is when a replica receives amodification for an entry it thinks has been deleted - then it willresurrect it. This seems like a very bad idea to me. I would expectthis to be a fatal replication error as something has gone seriously wrong.

Sorry for the long email... if anyone's managed to read this far anycomments would be much appreciated.


Thanks,

Martin

[ApacheDS][Mitosis] Replication data

Reply via email to