On 06/02/2009, at 6:20 AM, Chris Anderson wrote:
Antony, maybe it would help for you to explain just exactly what you
wouldn't be able to do, without the bulk docs API. It will help to
inform people about the technical issue.
My original email included this:
-------------------------------------------------------
For example, I have documents that can be cloned. The cloned document
contains a reference to the originating document. Then I delete the
original
document, the clone history needs to be updated to remove the
reference to
the original document and replace it with an original-deleted
history item.
There is a business case that requires this consistency.
With a transactional API this is easy. Without it, I can't see a
way to
maintain consistency in the face of concurrent application access
and/or
failure.
-------------------------------------------------------
However, I don't think this is really about a specific example.
The problem is that if you get one side of the relationship written
and
visible, but the other side not, then other concurrent accessors
will see a
partially successful update.
One response is "but you'll see this problem during replication",
but I
think this is making a big assumption about how replication is
managed/interleaved with local application behaviour.
Replication, and dealing with conflicts, is in no way automatic. As
others
have stated, there is no domain-independent way of resolving
conflicts.
Surely if it were possible to build a transactional API on top of a
conflict-based system, then this statement would not be true?
I am deploying CouchDB like a Notes CLIENT. Not as a high-performance
database server. Replication is an explicit operation, that halts
normal
activity. For my first delivery, replicas are read-only, so
replication
conflict isn't possible, but when I move to a distributed writers
scenario,
resolving replication conflicts will involve a specialized UI, that
allows
all conflicts to be resolved before normal operation resumes. Thus
the
editing application always sees a conflict-free database.
The use-case of someone doing a local operation e.g. submitting a
web form,
is very different than resolving replication conflicts. Conflict
during a
local operation is a matter of application concurrency, whereas
conflict
during replication is driven by the overall system model. It has
different
temporal, administrative and UI boundaries.
In short, I think it is a mistake to try and hide the different
characteristics of local (even clustered) operations, and
replication. You
may disagree, but if the system distinguishes between these two
fundamentally different things (distinguished by their partition-
tolerance),
you can code as though every operation leads to conflict if you
wish, but I
can't take advantage of the difference.
I know that the long-standing vision of Couch doesn't include
special
API exceptions for when you are running on a single node. And I'm a
little afraid that the transactional doc commits Antony wants us to
keep, are only a mirage, which would lead to trouble anyway, when
distributed systems are involved.
I don't understand why this needs to be the case. You can do
transactions in
distributed systems. Do you have a model that isn't amenable to a
Scalaris
treatment? Especially given that we're only talking about
transactions over
a set of processes that are providing an illusion of a single
system. Such a
cluster already requires some degree of partion-tolerance, right?
And if
not, then what distinguishes a cluster from a partition-tolerant
p2p mesh?
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
The fact that an opinion has been widely held is no evidence
whatever that
it is not utterly absurd.
-- Bertrand Russell