Nathan, I'm actually in the process of setting up a multi-tenant environment  
the canonical way, like you have.

I've seen the replication overhead get pretty intense, but I figure that 
scaling out to several couches is the way to go once the overhead becomes 
unbearable.  Actually I was hoping BigCouch would eventually be the answer.  

Why is this not the case for you? 

In one of those links you provided (JasonSmith@stackoverflow) said that db per 
user is the only scalable way.  It would be nice if he or someone here could 
weight in on why/how thats the only scalable way. Especially in light of Nathan 
claiming the exact opposite.

sb

On Jan 29, 2013, at 10:44 AM, Nathan Vander Wilt <[email protected]> 
wrote:

> # The problem
> 
> It's a fairly common "complaint" that CouchDB's database model does not 
> support fine-grained control over reads. The canonical solution is a database 
> per user:
> http://wiki.apache.org/couchdb/PerDocumentAuthorization#Database_per_user
> http://stackoverflow.com/a/4731514/179583
> 
> This does not scale.
> 
> 1. It complicates formerly simple backup/redundancy: now I need to make sure 
> N replications stay working, N databases have correct permissions, instead of 
> just one "main" database. Okay, write some scripts, deploy some cronjobs, can 
> be made to work...
> 
> 2. ...however, if data needs to be shared between users, this model 
> *completely falls apart*. Bi-directional continuous filtered replication 
> between a "hub" and each user database is extremely resource intensive.
> 
> I naïvely followed the Best Practices and ended up with a system that can 
> barely support 100 users to a machine due to replication overhead. Now if I 
> want to continue doing it "The Right Way" I need to cobble together some sort 
> of rolling replication hack at best.
> 
> It's apparent the real answer for CouchDB security, right now, is to hide the 
> database underneath some middleware boilerplate crap running as DB root. This 
> is a well-explored pattern, by which I mean the database ends up with as many 
> entry points as a sewer system has grates.
> 
> 
> # An improvement?
> 
> What if CouchDB let you define virtual databases, that shared the underlying 
> document data when possible, that updated incrementally (when queried) rather 
> than continuously, that could even internally be implemented in a fanout 
> fashion?
> 
> - virtual databases would basically be part of the internal b-tree key 
> hierarchy, sort of like multiple root nodes sharing the branches as much as 
> possible
> - sharing the underlying document data would almost halve the amount of disk 
> needed versus a "master" database storing all the data which is then copied 
> to each user
> - updating incrementally would put less continuous memory pressure on the 
> system
> - haven't actually done the maths, so I may be missing something, but 
> wouldn't fanning out changes internally from a master database through 
> intermediate partitions reduce the processing load?
> 
> Basically, rather than each time a user updates a document, copying it to a 
> master database, then filtering every M updates through N instances of 
> couchjs; instead internally CouchDB could build a tree of combined filters — 
> say, master database filters to log(N) hidden partitions at the first level 
> and accepted changes would trickle through only relevant further layers. (In 
> a way, this is kind of at odds with the incremental nature — maybe it does 
> make sense to pay an amortized cost on write rather than on reads.)
> 
> 
> # The urgency
> 
> Maybe this *particular* solution isn't really a solution, but we need one:
> 
> If replicating amongst per-user databases is the only correct way to 
> implement document-level read permissions, CouchDB **NEEDS** built-in support 
> for a scalable way of doing so.
> 
> There are plenty of other feature requests I could troll the list with 
> regarding CouchApps. But this one is key; everything else I've been able to 
> work around behind a little reverse proxy here and in front of an external 
> process there. Without scalable read-level security, I see no particular 
> raison d'être for Apache CouchDB — if CouchDB can't support direct HTTP 
> access in production in general, then it's just another centralized database.
> 
> 
> thanks,
> -natevw

Reply via email to