Hi all,
I am currently looking into some of the replication issues, specifically
DIRSERVER-894 ("Older concurrent changes are never replicated"),
DIRSERVER-1097 ("Only send net changes during replication") and
DIRSERVER-1101 ("New replicas may never receive some recent modifications").
I think these issues will require changing the replication data format.
Currently the replication logs are stored in a single database table
with time, replica ID, sequence number and operation columns. The first
3 comprise the CSN and the last is for a serialised operation object.
DIRSERVER-894 needs a way to work out the CSN at the point a specific
attribute was last modified. DIRSERVER-1097 needs a way to find
previous log entries based on entryUUID, modification type and attribute
ID. We are also planning on moving the replication data to the DIT.
Given all this I am thinking of removing the serialised operation blob
and replacing it with extra table(s) for each operation type storing the
operation's data across multiple columns. This will allow us to
efficiently query the replication logs based on the operation data.
Perhaps this would be a good time to make the jump to storing the
replication data in the DIT. It seems that that would be well suited to
storing the operations in an "exploded" format. I am thinking of the
following kind of format:
ou=logs/
cn=<csn>/
objectClass: ... (indicates operation type)
time: ...
replicaID: ...
operationSequence: ...
entryUUID: ...
attributeID: <attributeName> (for attribute modifications)
cn=attributes/
<attributeName>: <attributeValues>
The biggest concern I have for this is the inflexibility of LDAP
searches. Do we have a sort control in ApacheDS? Also, if we have the
attributes for the operation in a child entry how can we find an
operation in the logs based on those attributes.
At the same time I am thinking about a couple of things in the
replication system that don't seem to be necessary.
Firstly, once DIRSERVER-894 is fixed, I don't think we will need the
entryCSN attribute. I believe that it is only used to check whether an
operation should be applied to an entry or not (i.e. is it a new
modification), but this is broken and we need to check the CSN per
attribute by using the logs instead.
Secondly, I don't really see the point of "tombstoning" entries (marking
them as deleted instead of really deleting them). The only time I can
see it having any kind of effect is when a replica receives a
modification for an entry it thinks has been deleted - then it will
resurrect it. This seems like a very bad idea to me. I would expect
this to be a fatal replication error as something has gone seriously wrong.
Sorry for the long email... if anyone's managed to read this far any
comments would be much appreciated.
Thanks,
Martin