On 09/02/2009, at 4:06 PM, Damien Katz wrote:

I see the critical problem being with consistent updates of replication. Unless you do it one big transaction, the intermediate replication states of the database are inconsistent, so the target database is unusable during replication.

Absolutely, which is a valid use-case, especially (maybe only if...) you can rollback a replication.

A bulk transaction is limited in how many docs it can handle, so it only works for smallish databases. That alone means MVCC replication isn't useful in the general case.

The current replicator could be made MVCC aware in a conceptually simple manner:

source:

  loop
    cycle_end = update_seq from the current MVCC state.
send each replication record to the replication stream until update_seq > cycle_end
    send a 'commit point' in the replication stream
    if no records were sent, close the replication stream
    restart this loop from cycle_end + 1
  end

note that the replication stream will always end with a commit point modulo comms failure.

target:

if not configured for consistent replication, (globally, or on a per-request basis)

    do as you currently do, ignoring commit points

  else

    rollback = current MVCC commit point
    loop
      apply replication updates until
        end =>
          if updates were applied
            rollback
          return
        a commit point arrives =>
          rollback = current MVCC commit point
    end

Maybe (and I'm not sure of this) the db could maintain a last-known- good (wrt replication) MVCC commit point, so that the decision to rollback could be deferred.

You'll also need to serialize the updates to the database in the application layer and add the conflict checking there. That will give you the desired transaction semantics.

Serializing updates is a very poor alternative to a transactional API, and requiring that the application layer do that is awful, because there may be multiple independent applications accessing the db.

If you build what I've described, and assuming you can live with the limitations, you will have a always consistent one-way replication document distribution platform.

With a user-level transactional API e.g. expose the MVCC commit, and the simple (on the source side) replication change, this is also possible without requiring work in the application layer, and with IMO a considerably wider coverage of the use-cases.

Once again, modulo the cluster-ACID issue.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Borrow money from pessimists - they don't expect it back.
  -- Steven Wright


Reply via email to