Hi, Chris. I totally agree with the requirements this idea serves. I've got some quick questions inline.
On Sun, Aug 28, 2011 at 2:39 AM, Chris Anderson <[email protected]> wrote: > So for instance if I have a private database setup for my message > browsing couchapp to run in, and there is a public database on a > server I trust, that runs reader_acls, then I can set up continuous > replication from there. Anyone in my organization who wants to > circulate a document among an adhoc group of people, would drop it in > the shared database, with the group of folks listed on the document. > Then it would be visible to them as they replicate, but not to anyone > else. What would a _changes query look like to me? Would it look like filtered replication, where I simply never see updates that aren't approved? But if I never see updates, how does the _changes responder know which to show and which to hide? Would it fetch each doc and read the _acl value? OTOH, if I can see update records, what would be the value of the "doc" field if I query include_docs=true? Even if the "doc" value is null (to hide it from the user), leaking the _id and revs (the "changes" field) may have security implications. An _id might have an email address or other private information. A _rev (I'm reaching here, but stick with me) might also leak information. Consider an auction with secret bidding. Each lot is a document. Observing how _rev values change might inform you how frequently bids are made on the lot, hinting at which lots are selling well and which are going to be cheap. And the _id might tell you which lot it is. What will deleted documents look like? Against all reason and propriety, people are storing data in "deleted" documents--in production!. (They say it is for auditing.) People also use HTTP DELETE, as well as updates like {"_id":"foo","_deleted":true}. When I replicate, it would be nice to receive this delete event; but I am not on the ACL anymore. Or can everybody see a deleted document? But then couldn't they see the extra data in there for auditors' eyes only (also the problematic _id and _rev trees)? Or do deleted docs obey ACLs just like any other document revision? That's a shame, because HTTP DELETE would implicitly strip all users from the ACL. Or does HTTP DELETE trigger ACL inheritance? Which revision does it inherit from? > Doing this is possible today but it involves a bunch of filtered > replication and app code to enforce that filters are applied. This is a very important statement. I take you to mean there are multiple solutions to this problem; and by implication ACLs are the best solution. > Providing an optional shared or reader_acl mode for use at sync points > seems like a user friendly way to simplify something people already > want to do. > > A potential design: > > On the _security object setting reader_acl = true would enable the > reader access control lists, and make _views and _lists (and geocouch, > etc) into admin-only resources. Would another security model possibly come along later? If there are choices among mutually-exclusive models, maybe it should be mode = "acl" For example, a blacklist-based security policy might be neat. Or maybe the "closed source couch app" where ddocs aren't visible but _show, _list, and _update are. You wouldn't want those all enabled at the same time would you? Or would you?!? :) Maybe this is bikesheddy at this stage. > I'm imagining the way the reader ACLs would look on the documents is a > new top level field "_acl" that has a similar names/roles value > structure as the _security object: A note about CouchDB adoption and comprehensibility: usually ACLs (especially role-based systems like Couch) are not simply one list of readers; but a matrix with rights as columns and roles as rows; and you have yes/no values for roles/rights combinations. Perhaps a bit more structure, then: { "_id": "someid" , "_acl": { "read": {"names": [...], "roles": [...]} , "write": {"names": [...], "roles": [...]} , "can_update_on_tuesdays": {"names"..., "roles"...} } } validate_doc_update could still implement the "write" and Tuesday-updates support, but I propose you give them a namespace to work with. > > { > _id : "someid", > foo : "bar", > _acl : { > names : ["[email protected]"] > roles : ["aliens", "dogs"] > } > > So this a document that can be read by me, and also by any aliens or dogs. What is the response for docs with _acl undefined? What is the response for docs with _acl = {}? What is the response for docs with _acl = {"names":[], "roles":[]}? Are all three responses the same or do they differ? For the third version, is it like the _security behavior where that means everybody can read? Or is it *dissimilar* from _security but more like people's expectations where *nobody* can read (except admins)? > How do people feel about this proposal? A final thought is that _security does not replicate. Developers have to keep them correct and synchronized already. If they forgot reader_acl=true then the database is exposed. Even now, if you create a database, it is totally public by default. That is a hurdle for developers to clear when deploying. You said this is all doable today. I wonder if ACLs are a comprehensive solution, or give the best bang-for-buck. CouchDB is assembly language and we are all writing in assembler. I wish we could program with more abstract concepts and they would compile down to the existing API. Consider a declarative way to tell Couch, "This is public data, and that per-user data, and these are the users and those are the ACLs. Please do, you know, all that stuff the mailing lists tells me to do manually." And Couch took care of per-user DBs, filtered replication, routing queries to the correct DB, and all of that hard stuff. I like how this alternative could largely be prototyped with external tools and run on any couch. (Although my point is to eventually build it into couch.) -- Iris Couch
