Alex Karasulu wrote:
It seems that that would be well suited to
storing the operations in an "exploded" format. I am thinking of the
following kind of format:
ou=logs/
cn=<csn>/
objectClass: ... (indicates operation type)
time: ...
replicaID: ...
operationSequence: ...
entryUUID: ...
attributeID: <attributeName> (for attribute modifications)
cn=attributes/
<attributeName>: <attributeValues>
The biggest concern I have for this is the inflexibility of LDAP
searches. Do we have a sort control in ApacheDS?
What types of searches do you envision performing, for which LDAP is too
inflexible? OpenLDAP's syncrepl can be pretty much entirely mapped onto plain
search operations. We gain a lot of versatility by keeping things generic.
At the same time I am thinking about a couple of things in the
replication system that don't seem to be necessary.
Firstly, once DIRSERVER-894 is fixed, I don't think we will need the
entryCSN attribute. I believe that it is only used to check whether an
operation should be applied to an entry or not (i.e. is it a new
modification), but this is broken and we need to check the CSN per
attribute by using the logs instead.
In some ways the entryCSN is redundant info, but it's still useful for 3rd
party clients. It's essential for OpenLDAP syncrepl.
Right no problem if you want to axe it we can do that. Oh this reminds
me that we also need to make sure we're generating UUIDs all the time
even if replication is not enabled. We want to have the entryUUID as an
operational attribute of all entries so when replication is turned
things work. We can also use the UUID for many other things.
Yes.
Secondly, I don't really see the point of "tombstoning" entries
(marking
them as deleted instead of really deleting them). The only time I can
see it having any kind of effect is when a replica receives a
modification for an entry it thinks has been deleted - then it will
resurrect it. This seems like a very bad idea to me. I would expect
this to be a fatal replication error as something has gone seriously
wrong.
I've got to admit that I'm not well versed enough on this topic to
answer you on this but I do know that it is a valid techique that is
widely practiced in replication theory. For example it's used in Active
Directory. So I would recommend researching this topic a little bit but
I'm open to anything as long as we are educated about it.
Active Directory has a lot of misfeatures... Having spent a couple weeks of
"quality time" with it, the flaws just leap out... Do you really like the idea
of carrying obsolete info around and needing a sweep task to go thru and clean
up periodically?
Sorry for the long email... if anyone's managed to read this far any
comments would be much appreciated.
Hey it took me a while sorry for that. This is a very important topic
that we need to get right. I also have a couple of other points or
topics I want to touch on.
(1) I think it would be really nice to be able to replicate with
OpenLDAP and also learn about the sync replication mechanism used.
Perhaps they have some nice techniques which we have not thought of yet.
It's fair to say that we've faced the same issues already ;) Also our MMR
support is still immature, we don't yet do value-level conflict resolution.
But the plan for that is pretty straightforward.
(2) I know OpenLDAP leverages a changelog similar but not exactly the
same as our changelog. Perhaps we need to explore this relationship and
figure out how to better leverage this changelog. I think the CSN is
synonymous with a revision except revisions are local and CSN's are global.
Normal syncrepl doesn't rely on any logs; it simply uses entryCSNs. It
replicates whole entries (and therefore MMR only provides entry-level conflict
resolution). It can use a session log to optimize the replication of delete
operations, but doesn't actually need that.
Delta-syncrepl uses the log schema (which I pointed you at already) to
replicate only individual changes. This is the mechanism we'll be extending to
provide value-level conflict resolution for MMR. The basic approach is that
with every delta received, we also send the entry's old entryCSN. If that
doesn't match the entryCSN on the replica, then some other write has occurred
and there is a potential conflict. At that point we can search backward
through the changelog for that entryUUID or entryCSN and find the point of
divergence.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/