Re: [ApacheDS] Implementing isolation using multi-version concurrency control (MVCC)

Emmanuel Lecharny Sat, 31 Jan 2009 02:25:54 -0800

Alex Karasulu wrote:

Hi all,


Emmanuel and I were having an interesting conversation about the kinds of
transaction properties needed by ApacheDS to comply with various
specification requirements. For example all LDAP operations must be atomic
and must preserve consistency. Furthermore, one can easily debate the need
for isolation where any operation does not see or get impacted by the
partial changes of another concurrent operation.

We started discussing these ACID properties and ways in which they could be
implemented. Isolation is very interesting now thanks to directory
versioning and the change log. We can easily implement isolation now. When a
relatively lengthy operation like a search is being conducted, it should not
see modifications within scope that occur after the search began. The search
operation in the example can be replaced with any other operation minus all
unbind, some extended, and all abandon requests.

Atomicity and Isolation are both complex to guarantee in a LDAP server.

If we think about Atomicity, for instance, even if we can guarantee itsomehow for somesimple operation like Modify, Add or Delete, for theModDN operation is not that simple. We have to guarantee that all thepotential renames are done - or reverted - as a whole. As this operationcan impact a big part of the server, and could take several seconds(minutes, hours, dependening on the number of entries), this isobviously not trivial. However, moddn operation aren't the most frequentone. Regarding the most simpler operation (add, delete and modify), Ithink we should implement some kind of "transaction" in the backend :the modified entry has to be tagged as 'under modification' until thebackend has updated correctly the modification (or rollbacked it). Thenwe can untag the entry, and it's available back. How the CL can comeinto play here is to be discussed. IMHO, the CL and this 'transaction'will work hand to hand at some point.

Regarding isolation, it's a bit more difficult, as a search can alreadyhave sent back some results which could be change by another modifyoperation. This is especially the case for a ModDN operation.

The change log, not only tracks each change, but it allows the directory
server to inject the "revisions" attribute into entries. The revisions
attribute is multi-valued and lists all the revisions of changes which
altered the entry. For the search example, we can conduct the search

operation while locking it down to a revision.

That does not work for deleted entries, obviously ...

 This is best implemented by
conditionally filtering out or injecting candidates with revisions greater
than the revision at which the search operation started. Let's call the
revision when search started S. So entries in the server which posses
revision numbers greater than S need further evaluation. We have to evaluate
if the filter matches these entries with revisions > S when their state was
at revision S. This may require some reverse patching and re-evaluation of
the filter on the patched entry in state S.  This is not so bad because
there usually are not that many changes taking place at the same time on the
same entry: meaning the number of LDIF's to patch on an entry to evaluate in
it's former state at S will be small. This way we effectively lock down the
search to a specific revision, giving the search operation what appears to
be a snapshot of the DIT. The search results will not be impacted by any
concurrent changes.

Well, I don't think this is the best approach. In any case, a LdapSearch is considered as a dirty read. We have no way to lock down themodification on the read entries. So the user has _no_ guarantee thatthe entry he gets back will be valid. Usually, it doesn't matter,because the ratio of read vs writes on a LDAP server is just so big thatwe consider we don't have modifications. So we can simply return thelatest revision, whatever it is. Anyway, there is another aspect we haveto consider : once the user gets his entry, and before he uses it,before potentially send it back as a modification to the server, thevery same entry can have been modified in between. As we don't lockentries, we can't protect the users from such a case.


<digression>

We have to remember here that we are not dealing with bank accounts andbalances, or nuclear plants. Most of the case, we are using a LDAPserver to handle identities. They rarely change, or when they do, it'sbecause the person owning this identity is changing his own identity -thus limitating the odds that he is using it at the same time -.Usually, if we think about authorizations, which are subject to way morechanges that identities, we can't consider it as a continuous flow ofmodification.

In other words, creations or deletion of entries might be frequent,modification should be quite rare.

I have gathered some stats from some of my clients, and the rationchanges/reads is like 1/5000... I would be interested to get morenumbers here !

Last, not least, I consider that if the ratio goes up to something like1/100, then it's time to consider using a transactional system, namely,a RDBMS.

</digression>

So far, I'm not saying that it's wrong to think about using a MVCCsystem, but I'm a bit sceptic about the gain in term of isolation (the Iin ACID) it will offer in our case.


Let's discuss this anyway, it's interesting !

--
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org

Re: [ApacheDS] Implementing isolation using multi-version concurrency control (MVCC)

Reply via email to