Re: Transactional _bulk_docs

Jan Lehnardt Fri, 06 Feb 2009 00:13:51 -0800

Hi Paul,

thanks for nailing the history of this.



Can we now please just wait for Damien's patch to arrive
to start discussion. This is a bad situation but I'd like to
go forward and I don't want to spend any more time on
the forensics of how things came to be. I hope it is now
pretty clear, that our modus operandi for some of the
fundamental work that Damien is doing is a little
controversial, but the PMC agrees it is good for the
Software and so far everybody else* agreed. We'll work
on communicating ongoing things with the community.

The technical issues will be talked about when we know
what Damien's patch does in detail. Thanks Antony for
explaining again what your use-case is. This information
will be a valuable addition for the discussion to come.

We're all in this to make really awesome software. Can
we get back to that?


Cheers
Jan
--
* By which I mean literally everybody, look at the
buzz CouchDB is creating in the industry.

On 6 Feb 2009, at 06:13, Paul Davis wrote:

On Thu, Feb 5, 2009 at 10:02 PM, Antony Blakey <[email protected]> wrote:
On 06/02/2009, at 6:20 AM, Chris Anderson wrote:
Antony, maybe it would help for you to explain just exactly what you
wouldn't be able to do, without the bulk docs API. It will help to
inform people about the technical issue.
My original email included this:

-------------------------------------------------------

For example, I have documents that can be cloned. The cloned document
contains a reference to the originating document. Then I delete theoriginaldocument, the clone history needs to be updated to remove thereference tothe original document and replace it with an original-deletedhistory item.
There is a business case that requires this consistency.
With a transactional API this is easy. Without it, I can't see away tomaintain consistency in the face of concurrent application accessand/or
failure.

-------------------------------------------------------

However, I don't think this is really about a specific example.
The problem is that if you get one side of the relationship writtenandvisible, but the other side not, then other concurrent accessorswill see a
partially successful update.
One response is "but you'll see this problem during replication",but I
think this is making a big assumption about how replication is
managed/interleaved with local application behaviour.
Replication, and dealing with conflicts, is in no way automatic. Asothershave stated, there is no domain-independent way of resolvingconflicts.
Surely if it were possible to build a transactional API on top of a
conflict-based system, then this statement would not be true?

I am deploying CouchDB like a Notes CLIENT. Not as a high-performance
database server. Replication is an explicit operation, that haltsnormalactivity. For my first delivery, replicas are read-only, soreplicationconflict isn't possible, but when I move to a distributed writersscenario,resolving replication conflicts will involve a specialized UI, thatallowsall conflicts to be resolved before normal operation resumes. Thusthe
editing application always sees a conflict-free database.
The use-case of someone doing a local operation e.g. submitting aweb form,is very different than resolving replication conflicts. Conflictduring alocal operation is a matter of application concurrency, whereasconflictduring replication is driven by the overall system model. It hasdifferent
temporal, administrative and UI boundaries.

In short, I think it is a mistake to try and hide the different
characteristics of local (even clustered) operations, andreplication. You
may disagree, but if the system distinguishes between these two
fundamentally different things (distinguished by their partition-tolerance),you can code as though every operation leads to conflict if youwish, but I
can't take advantage of the difference.
I know that the long-standing vision of Couch doesn't includespecial
API exceptions for when you are running on a single node. And I'm a
little afraid that the transactional doc commits Antony wants us to
keep, are only a mirage, which would lead to trouble anyway, when
distributed systems are involved.
I don't understand why this needs to be the case. You can dotransactions indistributed systems. Do you have a model that isn't amenable to aScalaristreatment? Especially given that we're only talking abouttransactions overa set of processes that are providing an illusion of a singlesystem. Such acluster already requires some degree of partion-tolerance, right?And ifnot, then what distinguishes a cluster from a partition-tolerantp2p mesh?
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
The fact that an opinion has been widely held is no evidencewhatever that
it is not utterly absurd.
-- Bertrand Russell
I'm upset that CouchDB doesn't make me coffee in the morning.

But the thing is, CouchDB is totally willing to make you coffee *and*
bacon. It loves you *that* much.

Enough with the silly. I've watched this drama avalanche for awhile
and I finally think it's time for me to put out a few words on what
I've seen.

A brief history:
1. The mythical IRC conversation on 'removing' the feature: (roughlyquoted)
Damien: I don't think we can support transactional commits in the face
of multiple nodes. We can do ACID writes to disk so that updates
aren't lost, but checking with an unbounded number of servers that a
commit doesn't conflict isn't feasible.

Everyone else: That's pretty reasonable.

2. A patch was applied to trunk that made commits to CouchDB
optionally ACID compliant (which gives users the traditional
speed/safety choice) as well as removing the atomic 'all or none'
semantics.

3. Huge ML threads.

History complete.

Current status (through my eyes):

Near as I can tell Damien has been nose to the grindstone for quite
some time on this very specific part of the api. Would I like more
status updates and ideas on where he's heading? Of course. Do I trust
him? Yes. Is the community as a whole going to blindly accept some
asinine patch that has no value that removes a crap load of
functionality? No.

Controversy!

I tend to think that the 'discussion' that everyone keeps referring to
hasn't even occurred yet. I look at the patch that was applied that
caused this as an unfortunate early release.

What?!

Admissions first: I have no money riding on this issue. Whether or not
CouchDB has transactional _bulk_docs worries me not at all. Though, I
can't say that I have that much sympathy for a business model that
relies on an open source project's trunk to remain compatible with
required assumptions.

Break:

People seem to think that this conversation is over and done with. It
isn't. This is a part of the API that's under work and will change.

Reductio ad absurdum:
Do we require a mailing list thread for every character changed inthe source?
HTH,
Paul Davis

Re: Transactional _bulk_docs

Reply via email to