On Tue, Jan 29, 2013 at 7:44 PM, Nathan Vander Wilt < [email protected]> wrote:
> # The problem > > It's a fairly common "complaint" that CouchDB's database model does not > support fine-grained control over reads. The canonical solution is a > database per user: > http://wiki.apache.org/couchdb/PerDocumentAuthorization#Database_per_user > http://stackoverflow.com/a/4731514/179583 > > This does not scale. > > 1. It complicates formerly simple backup/redundancy: now I need to make > sure N replications stay working, N databases have correct permissions, > instead of just one "main" database. Okay, write some scripts, deploy some > cronjobs, can be made to work... > > 2. ...however, if data needs to be shared between users, this model > *completely falls apart*. Bi-directional continuous filtered replication > between a "hub" and each user database is extremely resource intensive. > > I naïvely followed the Best Practices and ended up with a system that can > barely support 100 users to a machine due to replication overhead. Now if I > want to continue doing it "The Right Way" I need to cobble together some > sort of rolling replication hack at best. > > It's apparent the real answer for CouchDB security, right now, is to hide > the database underneath some middleware boilerplate crap running as DB > root. This is a well-explored pattern, by which I mean the database ends up > with as many entry points as a sewer system has grates. > > > # An improvement? > > What if CouchDB let you define virtual databases, that shared the > underlying document data when possible, that updated incrementally (when > queried) rather than continuously, that could even internally be > implemented in a fanout fashion? > > - virtual databases would basically be part of the internal b-tree key > hierarchy, sort of like multiple root nodes sharing the branches as much as > possible > - sharing the underlying document data would almost halve the amount of > disk needed versus a "master" database storing all the data which is then > copied to each user > - updating incrementally would put less continuous memory pressure on the > system > - haven't actually done the maths, so I may be missing something, but > wouldn't fanning out changes internally from a master database through > intermediate partitions reduce the processing load? > > Basically, rather than each time a user updates a document, copying it to > a master database, then filtering every M updates through N instances of > couchjs; instead internally CouchDB could build a tree of combined filters > — say, master database filters to log(N) hidden partitions at the first > level and accepted changes would trickle through only relevant further > layers. (In a way, this is kind of at odds with the incremental nature — > maybe it does make sense to pay an amortized cost on write rather than on > reads.) > > > # The urgency > > Maybe this *particular* solution isn't really a solution, but we need one: > > If replicating amongst per-user databases is the only correct way to > implement document-level read permissions, CouchDB **NEEDS** built-in > support for a scalable way of doing so. > > There are plenty of other feature requests I could troll the list with > regarding CouchApps. But this one is key; everything else I've been able to > work around behind a little reverse proxy here and in front of an external > process there. Without scalable read-level security, I see no particular > raison d'être for Apache CouchDB — if CouchDB can't support direct HTTP > access in production in general, then it's just another centralized > database. > > > thanks, > -natevw There is another solution though, replication using a view change and real replication using a view like in rcouch [1]. With the validate_doc_read function [2] you can do that. No need for a background process. Hopefully this will be merged in couchdb. This have been tested on a relatively large scale. Some others features are also coming soon that will helps you to manage replication & changes in couchdb directly. Notably this "virtual database" thing though it's more a direct way to setup such kind of process. More info about rcouch: http://rcouch.org & https://github.com/refuge/refuge-media/blob/master/slides/rcouch_couchdbconf_20130128.pdf Hope it helps, - benoît [1] https://github.com/refuge/rcouch/wiki/View-Changes [2] https://github.com/refuge/rcouch/wiki/Validate-documents-on-read
