Nathan, I'm actually in the process of setting up a multi-tenant environment the canonical way, like you have.
I've seen the replication overhead get pretty intense, but I figure that scaling out to several couches is the way to go once the overhead becomes unbearable. Actually I was hoping BigCouch would eventually be the answer. Why is this not the case for you? In one of those links you provided (JasonSmith@stackoverflow) said that db per user is the only scalable way. It would be nice if he or someone here could weight in on why/how thats the only scalable way. Especially in light of Nathan claiming the exact opposite. sb On Jan 29, 2013, at 10:44 AM, Nathan Vander Wilt <[email protected]> wrote: > # The problem > > It's a fairly common "complaint" that CouchDB's database model does not > support fine-grained control over reads. The canonical solution is a database > per user: > http://wiki.apache.org/couchdb/PerDocumentAuthorization#Database_per_user > http://stackoverflow.com/a/4731514/179583 > > This does not scale. > > 1. It complicates formerly simple backup/redundancy: now I need to make sure > N replications stay working, N databases have correct permissions, instead of > just one "main" database. Okay, write some scripts, deploy some cronjobs, can > be made to work... > > 2. ...however, if data needs to be shared between users, this model > *completely falls apart*. Bi-directional continuous filtered replication > between a "hub" and each user database is extremely resource intensive. > > I naïvely followed the Best Practices and ended up with a system that can > barely support 100 users to a machine due to replication overhead. Now if I > want to continue doing it "The Right Way" I need to cobble together some sort > of rolling replication hack at best. > > It's apparent the real answer for CouchDB security, right now, is to hide the > database underneath some middleware boilerplate crap running as DB root. This > is a well-explored pattern, by which I mean the database ends up with as many > entry points as a sewer system has grates. > > > # An improvement? > > What if CouchDB let you define virtual databases, that shared the underlying > document data when possible, that updated incrementally (when queried) rather > than continuously, that could even internally be implemented in a fanout > fashion? > > - virtual databases would basically be part of the internal b-tree key > hierarchy, sort of like multiple root nodes sharing the branches as much as > possible > - sharing the underlying document data would almost halve the amount of disk > needed versus a "master" database storing all the data which is then copied > to each user > - updating incrementally would put less continuous memory pressure on the > system > - haven't actually done the maths, so I may be missing something, but > wouldn't fanning out changes internally from a master database through > intermediate partitions reduce the processing load? > > Basically, rather than each time a user updates a document, copying it to a > master database, then filtering every M updates through N instances of > couchjs; instead internally CouchDB could build a tree of combined filters — > say, master database filters to log(N) hidden partitions at the first level > and accepted changes would trickle through only relevant further layers. (In > a way, this is kind of at odds with the incremental nature — maybe it does > make sense to pay an amortized cost on write rather than on reads.) > > > # The urgency > > Maybe this *particular* solution isn't really a solution, but we need one: > > If replicating amongst per-user databases is the only correct way to > implement document-level read permissions, CouchDB **NEEDS** built-in support > for a scalable way of doing so. > > There are plenty of other feature requests I could troll the list with > regarding CouchApps. But this one is key; everything else I've been able to > work around behind a little reverse proxy here and in front of an external > process there. Without scalable read-level security, I see no particular > raison d'être for Apache CouchDB — if CouchDB can't support direct HTTP > access in production in general, then it's just another centralized database. > > > thanks, > -natevw
