On 09/02/2009, at 4:06 PM, Damien Katz wrote:
I see the critical problem being with consistent updates of
replication. Unless you do it one big transaction, the intermediate
replication states of the database are inconsistent, so the target
database is unusable during replication.
Absolutely, which is a valid use-case, especially (maybe only if...)
you can rollback a replication.
A bulk transaction is limited in how many docs it can handle, so it
only works for smallish databases. That alone means MVCC replication
isn't useful in the general case.
The current replicator could be made MVCC aware in a conceptually
simple manner:
source:
loop
cycle_end = update_seq from the current MVCC state.
send each replication record to the replication stream until
update_seq > cycle_end
send a 'commit point' in the replication stream
if no records were sent, close the replication stream
restart this loop from cycle_end + 1
end
note that the replication stream will always end with a commit
point modulo comms failure.
target:
if not configured for consistent replication, (globally, or on a
per-request basis)
do as you currently do, ignoring commit points
else
rollback = current MVCC commit point
loop
apply replication updates until
end =>
if updates were applied
rollback
return
a commit point arrives =>
rollback = current MVCC commit point
end
Maybe (and I'm not sure of this) the db could maintain a last-known-
good (wrt replication) MVCC commit point, so that the decision to
rollback could be deferred.
You'll also need to serialize the updates to the database in the
application layer and add the conflict checking there. That will
give you the desired transaction semantics.
Serializing updates is a very poor alternative to a transactional API,
and requiring that the application layer do that is awful, because
there may be multiple independent applications accessing the db.
If you build what I've described, and assuming you can live with the
limitations, you will have a always consistent one-way replication
document distribution platform.
With a user-level transactional API e.g. expose the MVCC commit, and
the simple (on the source side) replication change, this is also
possible without requiring work in the application layer, and with IMO
a considerably wider coverage of the use-cases.
Once again, modulo the cluster-ACID issue.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Borrow money from pessimists - they don't expect it back.
-- Steven Wright