Re: Half-baked idea: incremental virtual databases

Benoit Chesneau Tue, 29 Jan 2013 20:05:44 -0800

On Tue, Jan 29, 2013 at 7:44 PM, Nathan Vander Wilt <
[email protected]> wrote:


> # The problem
>
> It's a fairly common "complaint" that CouchDB's database model does not
> support fine-grained control over reads. The canonical solution is a
> database per user:
> http://wiki.apache.org/couchdb/PerDocumentAuthorization#Database_per_user
> http://stackoverflow.com/a/4731514/179583
>
> This does not scale.
>
> 1. It complicates formerly simple backup/redundancy: now I need to make
> sure N replications stay working, N databases have correct permissions,
> instead of just one "main" database. Okay, write some scripts, deploy some
> cronjobs, can be made to work...
>
> 2. ...however, if data needs to be shared between users, this model
> *completely falls apart*. Bi-directional continuous filtered replication
> between a "hub" and each user database is extremely resource intensive.
>
> I naïvely followed the Best Practices and ended up with a system that can
> barely support 100 users to a machine due to replication overhead. Now if I
> want to continue doing it "The Right Way" I need to cobble together some
> sort of rolling replication hack at best.
>
> It's apparent the real answer for CouchDB security, right now, is to hide
> the database underneath some middleware boilerplate crap running as DB
> root. This is a well-explored pattern, by which I mean the database ends up
> with as many entry points as a sewer system has grates.
>
>
> # An improvement?
>
> What if CouchDB let you define virtual databases, that shared the
> underlying document data when possible, that updated incrementally (when
> queried) rather than continuously, that could even internally be
> implemented in a fanout fashion?
>
> - virtual databases would basically be part of the internal b-tree key
> hierarchy, sort of like multiple root nodes sharing the branches as much as
> possible
> - sharing the underlying document data would almost halve the amount of
> disk needed versus a "master" database storing all the data which is then
> copied to each user
> - updating incrementally would put less continuous memory pressure on the
> system
> - haven't actually done the maths, so I may be missing something, but
> wouldn't fanning out changes internally from a master database through
> intermediate partitions reduce the processing load?
>
> Basically, rather than each time a user updates a document, copying it to
> a master database, then filtering every M updates through N instances of
> couchjs; instead internally CouchDB could build a tree of combined filters
> — say, master database filters to log(N) hidden partitions at the first
> level and accepted changes would trickle through only relevant further
> layers. (In a way, this is kind of at odds with the incremental nature —
> maybe it does make sense to pay an amortized cost on write rather than on
> reads.)
>
>
> # The urgency
>
> Maybe this *particular* solution isn't really a solution, but we need one:
>
> If replicating amongst per-user databases is the only correct way to
> implement document-level read permissions, CouchDB **NEEDS** built-in
> support for a scalable way of doing so.
>
> There are plenty of other feature requests I could troll the list with
> regarding CouchApps. But this one is key; everything else I've been able to
> work around behind a little reverse proxy here and in front of an external
> process there. Without scalable read-level security, I see no particular
> raison d'être for Apache CouchDB — if CouchDB can't support direct HTTP
> access in production in general, then it's just another centralized
> database.
>
>
> thanks,
> -natevw



There is another solution though, replication using a view change and real
replication using a view like in rcouch [1]. With the  validate_doc_read
function [2]  you can do that. No need for a background process.  Hopefully
this will be merged in couchdb. This have been tested on a relatively large
scale.

Some others features are also coming soon that will helps you to manage
replication & changes in couchdb directly. Notably this "virtual database"
thing though it's more a direct way to setup such kind of process.

More info about rcouch: http://rcouch.org &
https://github.com/refuge/refuge-media/blob/master/slides/rcouch_couchdbconf_20130128.pdf

Hope it helps,

- benoît

[1] https://github.com/refuge/rcouch/wiki/View-Changes
[2] https://github.com/refuge/rcouch/wiki/Validate-documents-on-read

Re: Half-baked idea: incremental virtual databases

Reply via email to