On Mon, May 25, 2009 at 1:44 AM, Scott Shumaker <sshuma...@gmail.com> wrote: > Chris - you said: > "Distributed bulk transactions would make for chaotic behavior, as > someone's mostly unrelated change on a remote node could eventually > replicate to me (months later) and knock an entire line of work that > I've done into a conflict state." > > Months later? If you're really out of sync with a remote node for > several months, you should expect to get lots of conflicts if people > are editing your shared documents, with or without distributed bulk > transactions. >
I'm still mulling over Chris's concerns. My gut-brain tells me there's a way around this but the idea hasn't floated up to the same brain that controls actual thoughts. > And if you really have a system designed to be offline for several > months, you're solving very different problems (albeit interesting > ones) than traditional data 'replication' as most people think about > it (when one entity controls the servers, even if they are > geographically distributed) - and getting into DVCS-like territory. > Heh. I think even in DVCS territory there'd still probably be a murder if a dev didn't push for a couple months. Which is part of the idea floating around. Ie, perhaps having a method for rejecting replication that is too far out of whack (Think of git's "You can't push because its not a fast forward") (This is all extreme hand waving so no one take that too seriously). > If that's the case, it does explain why there are some decisions in > CouchDB I find strange, like the fact that authentication runs during > replication. You're using replication for a completely different > purpose than I need it for - I need it for redundancy and read > scaling, you're using it to synchronize disparate data sources. Very > different problems that call for very different solutions. > CouchDB has always had a focus on being able to be run in a completely decentralized manner. As such there are features in CouchDB that support this model. That said, it's also a goal of supporting as wide a range of models as possible that fit into the general scheme of things. For instance, if replication validation is something you want to have configurable then you can create a ticket and dev@ thread discussing the issue. I don't detect any issues in having that disable-able but we should discuss that on a dedicated thread for future-me's sanity. HTH, Paul Davis > On Sun, May 24, 2009 at 10:31 PM, Scott Shumaker <sshuma...@gmail.com> wrote: >> Inter-document dependencies come up pretty quickly when you start >> trying to represent complex data structures in CouchDB. There are >> still a few cases we've encountered where there isn't a great way to >> avoid needing transactions. A few examples: >> >> 1) >> 'A' maintains an ordered list of 'B' elements, where the order is >> user-defined - imagine allowing a user to re-order photos in a >> slideshow. You want to store the photos separately from the >> slideshow, because they can be used in multiple slideshows. Whenever >> you add or remove a photo, you need to update the order list as well. >> >> I've seen some people suggest some sort of gross solution where you >> try to store floating point order id's inside the B elements and >> change that to wedge an item in between another items (averaging the >> two new siblings' orders), but this is incredibly brittle and breaks >> down with enough re-ordering. >> >> Another unpleasant approach is to create separate 'order objects' in >> couchdb (representing the order of an item within a folder), storing >> an internal version number (timestamp) inside the 'order object' - so >> you never change the order node, you just create a new node. Then, >> you only use use the 'latest' version of this order node (either on >> the client side or with a reduce). To make this work, you need to >> ensure that your 'internal version numbers' are monotonically >> increasing. This isn't a problem for some applications, and can be >> solved in general with a specialized 'number server'. >> >> 2) >> Representing graph/linked-list datastructures. >> >> If you delete a node from a linked list, you need to update two nodes >> - the previous node and the node itself. You can try the second >> suggestion in the previous item to make this work (effectively, store >> the link relationship as separate objects and generate new link >> objects with incrementing version numbers) >> >> I'm sure there are other cases - these two just have been a thorn in >> our side. But for a lot of non-trivial data applications, >> transactions still end up being very useful. >> >> On Fri, May 22, 2009 at 2:45 PM, Nathan Stott <nrst...@gmail.com> wrote: >>> As a user, when I chose couchdb for my most recent project, I chose it >>> because I didn't care about transactions. I would've used RDBMS if that >>> were important. >>> I chose it because couch solved the problems I needed solved very well. >>> >>> I don't think transactions should be a big dev focus. >>> >>> On Fri, May 22, 2009 at 4:30 PM, Chris Anderson <jch...@apache.org> wrote: >>> >>>> On Thu, May 21, 2009 at 8:30 PM, Yuval Kogman <nothingm...@woobling.org> >>>> wrote: >>>> > 2009/5/21 Adam Kocoloski <kocol...@apache.org>: >>>> >> Hi Yuval, thanks for this well-written proposal. I don't really want to >>>> >> rehash all the discussion from back in February (see the thread >>>> beginning at >>>> >> >>>> http://mail-archives.apache.org/mod_mbox/couchdb-dev/200902.mbox/%3c84f66023-030a-4669-b75c-3dcc92d71...@yahoo.com%3e >>>> for >>>> >> a particularly detailed discussion), but I do want to comment on one >>>> aspect. >>>> >> >>>> >> Updating the replicator to be smart about atomic bulk transactions is >>>> doable >>>> >> (although a major undertaking), but when you throw DB compaction and >>>> >> revision stemming into the mix things get really hairy. Recall that >>>> CouchDB >>>> >> revisions are used for concurrency control, not for maintaining history. >>>> >> Consider the following sequence of events: >>>> >> >>>> >> 1) Generate foo/1 and bar/1 in an atomic _bulk_docs operation >>>> >> 2) Update foo -> foo/2 >>>> >> Compact the DB (foo/1 is deleted) >>>> >> Start replicating to a mirror >>>> >> Replication crashes before it reaches foo/2 >>>> > >>>> > By crash you mean an error due to a conflict between foo/2 and foo/1' >>>> > (the mirror's version of foo), right? >>>> > >>>> >> In your proposal, we should expect foo/1 to exist on the mirror, right? >>>> I >>>> >> think this means we'd need to modify the compaction algorithm to keep >>>> >> revisions of documents if a) the revision was part of an atomic >>>> _bulk_docs, >>>> >> and b) any of the documents in that transaction are still at the >>>> revision >>>> >> generated by the transaction. Same thing goes for revision stemming -- >>>> we >>>> >> can never drop revisions if they were part of an atomic upload and at >>>> least >>>> >> one of the document revs in the upload is still current. >>>> > >>>> > Yep. Personally I see this is a tradeoff, not a limitation per se. If >>>> > you specify 'atomic' then you must pay more in terms of data size, >>>> > performance, etc. >>>> >>>> The problem as I see it is that someone else's bulk transaction will >>>> have to sit around in my database, until I edit all the docs in it. >>>> Hopefully I won't get any distributed conflicts on other old versions >>>> of docs in the group because this would put edits that I've done >>>> locally to other documents in the bulk group, somehow less valid. >>>> >>>> Distributed bulk transactions would make for chaotic behavior, as >>>> someone's mostly unrelated change on a remote node could eventually >>>> replicate to me (months later) and knock an entire line of work that >>>> I've done into a conflict state. >>>> >>>> If you want atomicity, put it in a single document. >>>> >>>> Chris >>>> >>>> > >>>> > In 0.8 you would have theoretically had to pay by default, but didn't >>>> > because replication broke transactions. >>>> > >>>> > The basic algorithm is still the same, but the garbage collected unit >>>> > is changed (instead of garbage collecting document revisions it >>>> > garbage collects revision sets, with the current case being a set with >>>> > one member. The rules still apply (if this object is wholly shadowed >>>> > by non conflicting changes then it can be disposed of)). IIRC the >>>> > algorithm is a copying garbage collector, so this is pretty easy to do >>>> > (you walk a DAG instead of a linked list). >>>> > >>>> > Under the proposed model you'd choose which operations are >>>> > transactional and will have to pay for those. >>>> > >>>> > >>>> > Anwyay, thanks for your link as well, I was reading through a rather >>>> > boring thread and didn't see this one, so I guess I did miss out. It >>>> > seemed to imply the discussion was done only on IRC. >>>> > >>>> > Anyway, here goes... >>>> > >>>> > The fundamental problem is that any consistent data model needs at the >>>> > very least to have atomic primitives and ordered message passing (with >>>> > transactional message handlers) at the per-partition level, or >>>> > atomicity and consistency is restricted to a single document. >>>> > >>>> > What concerns me is Damien's post >>>> > ( >>>> http://mail-archives.apache.org/mod_mbox/couchdb-dev/200902.mbox/%3c451872b8-152c-42a6-9324-dd52534d9...@apache.org%3e >>>> ): >>>> > >>>> >> No, CouchDB replication doesn't support replicating the transactions. >>>> >> Never has, never will. That's more like transaction log replication >>>> >> that's in traditonal dbs, a different beast. >>>> >> >>>> >> For the new bulk transaction model, I'm only proposing supporting >>>> >> eventual consistency. All changes are safe to disk, but the db may not >>>> >> be in a consistent state right away. >>>> > >>>> > From what I know this assumption is wrong. Eventual consistency still >>>> > needs atomic primitives, it's not about whether or not you have >>>> > transactions, it's about what data they affect (eventual consistency >>>> > involves breaking them down). >>>> > >>>> > Anyway, "never will" sounds pretty binding, but for the sake of argument: >>>> > >>>> > By using only insertions and idempotent updates for the bulk of the >>>> > data changes and a message queue whose handlers use atomic updates to >>>> > integrate this data one can implement a truly atomic distributed >>>> > model, or an eventual consistency, but without this updates need to be >>>> > restricted to exactly one document. >>>> > >>>> > Eventual consistency is still possible using either locks or by >>>> > breaking down what would have been large distributed transactions into >>>> > smaller ones, but the key is that the code that will make things >>>> > actually consistent must still have ACID guarantees (and be dispatched >>>> > in order). >>>> > >>>> > The 0.9 model CouchDB is effectively MyISAM without data loss, but >>>> > just because the data is around doesn't mean it's possible to know >>>> > what to do with it (loss of context), or even fix it safely (the >>>> > conflict resolution code is susceptible to conflicts too). >>>> > >>>> > Unfortunately for eventual consistency to actually work the breaking >>>> > down of operations must be done on application level, the database >>>> > can't decide which data can be deferred and which data cannot. >>>> > >>>> > All immutable data and all new data can obviously be added to the >>>> > database outside of a transaction, but eventually a transaction >>>> > linking this data must be part of an atomic mutation. >>>> > >>>> > The only way to support this without atomic operations on a unit >>>> > larger than a document, is to have a "master" document for every >>>> > transitive closure the graph structure requiring consistency, which in >>>> > effect only actually relates to immutable snapshot documents (e.g. >>>> > where the ID is a hash of the data). If these closures overlap then a >>>> > single "master" for the whole graph will be needed. >>>> > >>>> > >>>> > To illustrate, let's make up a social networking example. Let's say >>>> > you are adding a friend on this social network, and that this >>>> > operation involves 3 updates, one to add a link from your profile to >>>> > your friend's ID, another for the inverse, and a third update to >>>> > update to send a "hello" message to the friend, updating their inbox. >>>> > The first update lives in one partition, and the second and third >>>> > updates are on a second one. >>>> > >>>> > The back pointers in your new friends must be updated. In an fully >>>> > transactional model this would lock the friend's document and yours at >>>> > the same time, in an eventual consistency model this would queue a >>>> > message for the friend's partition, and a message handler on the >>>> > friend's partition would update this atomically "eventually". It's >>>> > fine for the link to be out of date for a while, but eventually it >>>> > needs to be fixed (e.g. if you want to remove the friend, message >>>> > them, etc). >>>> > >>>> > In couchdb 0.9 one of the writes will get a "conflict" error back, and >>>> > they could refetch the updated version and try the edit again. The >>>> > problem is that if the wrote the third update update to another >>>> > document on the same node making assumptions about the same data, that >>>> > write may have succeeded, leaving the data inconsistent. Under an >>>> > eventual consistency model you still use transactions to do these >>>> > updates, you just must design your model to break them down into >>>> > smaller units. >>>> > >>>> > The reason a graph structure is more susceptible to inconsistency is >>>> > that while in a relational model many data linkage operations can be >>>> > done with a single insert/update (e.g. `insert into edges (node1_id, >>>> > node2_id)`), in a document based database this type of opreation >>>> > involves modifying all the affected documents. The chance of >>>> > inconsistency is increased because contention is higher and there is >>>> > more data that must be synchronized. >>>> > >>>> > However, in another post Damien said: >>>> > >>>> >> Which is why in general you want to avoid inter-document dependencies, >>>> >> or be relaxed in how you deal with them. >>>> > >>>> > So I think I best shut up after this without some decision maker >>>> > telling me not to, if my use case is not covered by the intended >>>> > design then that's that, but I do think this thread sort of covers >>>> > this: >>>> > >>>> >> As far as distributed transactions go, I'd be thrilled if we could >>>> >> implement it and also support the rest of couchdb, like views and bi- >>>> >> directional replication. Please start up a discussion here in dev@ >>>> >> about it and see if you can work out a design. >>>> > >>>> > Without going too pie-in-the-sky. >>>> > >>>> > Cheers, >>>> > Yuval >>>> > >>>> >>>> >>>> >>>> -- >>>> Chris Anderson >>>> http://jchrisa.net >>>> http://couch.io >>>> >>> >> >