On Fri, Feb 4, 2011 at 10:47 AM, Howard Chu <[email protected]> wrote: > Alex Karasulu wrote: >> >> Hi there Howard! >> >> On Thu, Feb 3, 2011 at 9:56 PM, Howard Chu<[email protected]> wrote: >>> >>> Alex Karasulu wrote: >>>> >>>> FYI >>>> >>>> Hurray! Our respected friends at OpenLDAP are completing the >>>> transaction spec. Nice to know of an existing implementation. Here's a >>>> recent thread that started up on it: >>>> >>>> http://www.openldap.org/lists/openldap-devel/201102/msg00005.html >>>> >>>> Would be interesting to see how their implementation of transactions >>>> combines with syncRepl now in the picture. Specifically, I'm wondering >>>> if replication will trigger on transaction boundaries, rather than on >>>> each change in the transaction. Also wondering how change sequence >>>> numbers will be impacted. >>> >>> I'm wondering too! Many open questions with this spec. Though I'll note, >>> RFC4533 explicitly states (of syncrepl) "This protocol is not intended to >>> be >>> used in applications requiring transactional data consistency." (Section >>> 1.2) >> >> I was hoping you guys already figured that all out :-). >> >>> If folks are looking for transactional consistency in replication, we >>> should >>> probably develop a new spec to address that. >> >> Seems so now, thanks for the heads up. > > Syncrepl only promises eventual convergence, so there's really no reasonable > way to expect transactional consistency from it. Consider a replica > operating in refreshOnly mode, polling once every few minutes - between > refreshes, it's out of date anyway. When it pulls down a refresh, it will be > receiving entries one at a time; they could represent completed multi-entry > transactions or not, and any client querying the replica will see in-between > state during that refresh.
Yes its prone to dirty reads. There might be a way to work around this and actually obtain proper isolation. However it still requires transaction awareness in replication. Let me try to explain below. At one point, I investigated writing an optimistic local transaction manager with MVCC right above partitions (analogous to OpenLDAP backends). This way all partitions gain the MVCC capability without having to implement it themselves. With MVCC you gain a versioned DIB lending itself to better isolation. Incidentally it has some other positive advantages when combined with a long term change log, like snapshotting. But once a correlation is established between the version numbers, transaction identifiers, and change sequence numbers, (some of which may be combined as the same number/id) then you can obtain complete local transaction isolation. All writes are applied to the transaction log until the transaction commits. With respect to vanilla syncrepl, this has some implications. The server polling (say A) for changes from another server (say B), will not see intermediate updates during the course of a transaction. There will be no dirty reads from A->B. However a client C reading from A, can still encounter dirty reads. This is because server A is not aware the changes being pulled down from B must by applied in a transaction. > You could conceivably try to make refreshAndPersist transactional - during > the persist phase, you can send along the transaction controls with the > entry updates. Since basically the slapd implementation is to queue up all > operations of the transaction until the final Commit is received, and then > write all at once, this will impose some noticeable latency between the > provider and the consumer. The Consumer would then do the same, queuing up > all the received writes until it also receives the Commit message from the > provider. So assuming perfect networks and perfect hardware and software, > you could propagate the transactional state down the line. The changes would And this gives us the transaction boundary we need to write changes to the server receiving replication updates. Presume A receives updates this way and changes are persisted within transactions, then client C while reading from A would not encounter dirty reads. > become visible atomically, but at staggered intervals relative to the > execution time on the original server. But if any network link is broken, > the consumer has to catch up again by using a refresh phase, and the > transactional consistency is lost during that time. With fully isolated local transactions, that rollback on failure, won't we be safe in this refresh to catchup situation? > I guess for delta syncrepl, since we record changes in a log and play them > back in a known order, we could preserve the transactional state as well. Won't you still need to isolate changes during the course of applying a transaction while replaying from the log until the point where the commit completes? I perused the syncrepl spec but I'm way unprepared to rationalize over how to change it. I really appreciate your clarifications here. Right now we're cleaning up house trying to get out an ApacheDS 2.0 out the door. Once 2.0 GA is out, I want to fully implement an MVCC transaction manager and at this point figure out how that cooperates properly with syncrepl. I want to make sure we speak the same language and a hopefully a heterogeneous OpenLDAP/ApacheDS cluster can be achieved. Thanks, Alex
