Please let's stop using the word 'transactional'. Monotonic Writes requires nothing more 'transactional' then CouchDB already has e.g. stable storage. The word 'transaction' is commonly used to mean user- level ACID semantics, which the neither the Bayou nor PRACTI models provide.

On 16/02/2009, at 3:55 PM, Chris Anderson wrote:

So it seems as though, when a long history is replicated under your
model (interleaving many different client updates) we would end up
sending a lot more data over the wire under your proposed model.

With the tradeoff that you get Monotonic Writes. Whether you see a lot more data depends on the frequency of Replication wrt writes, and the distribution of writes. Clustered writes with isolation group optimization (i.e. protocol, not user-initiated) would end up sending little, if any more data than would currently be sent. Furthermore, Monotonic Writes might allow you to do differential encoding of subsequent revisions. This could be a fantastic win that would reduce the amount of data sent, even compared to the current protocol. Especially for attachments.

In order to ensure that the isolation group stays together, even should
replication fail before completion, we'd have to send the latest
doc-rev for every doc touched in each isolated doc group.

In order to get Monotonic Writes you need to do that, and it's independent of isolation groups. Isolation groups are a feature that allows you to send *less* data. Exposing them to the user is entirely another question.

In the current system we just send the latest non-conflicted rev or
all the conflict revs is they exist. It makes for a lot less data on
the wire. (Correct me if I'm wrong.)

Correct, although incremental replication creates states that don't provide a Monotonic Write guarantee.

Your story about comments being replicated without their assocaited
posts is a good example of the counter-intuitive things that can
happen when replication fails before completion. Thanks for that.

The current replication implementation, not replication per se.

I think these questions are interesting, I really do. However, in my
mind, what makes CouchDB relaxing, is that we're not trying to be
ambitious on the transactional guarantees front. So far, we've tried
to give only the guarantees we know we can afford to give, and
concentrate on getting them right.

It isn't clear that the tradeoff needs to be forced. A system that provides Monotonic Writes can easily optimize for bandwidth, either adaptively or via configuration, but the reverse is not true.

One example of adaptive optimization is automatically increasing the size of the isolation groups depending on the measured performance characteristics of the channel, and the size of the data.

You can configure a system that can provide Monotonic Write guarantees to not do so.

Robert's point that much of this can be implemented on top of CouchDB
is an interesting one. If it is indeed the case, then the question
becomes whether clients or the database should be responsible for
providing the transactional API.

I'm still processing Robert's point in the context of the papers, but I'm not sure that it's true that it can be done without modification to CouchDB. It may not be practical to carry session-long version vectors in a light-weight client. I'm more certain that it can't be done in the context of Partial Replication. In any case, this can be done once, efficiently, in the server, rather than ineficiently (if at all) in a lightweight client.

In the Bayou papers the sessions need to be persistent - the Bayou context is an explicit client-server model with a persistent client.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Reflecting on W.H. Auden's contemplation of 'necessary murders' in the Spanish Civil War, George Orwell wrote that such amorality was only really possible, 'if you are the kind of person who is always somewhere else when the trigger is pulled'.
  -- John Birmingham, "Appeasing Jakarta"


Reply via email to