One addition, the slotting in of _access into existing security mechanisms is as follows:
1. Check if a user is in _security 2. If yes, check it user is in _access (modulo read/write) 3. If yes, does the doc update pass any globally defined VDUs 4. If yes, operation can proceed. Cheers Jan — > On 10. Mar 2019, at 15:51, Jan Lehnardt <j...@apache.org> wrote: > > Hey all, > > after mulling this over some more, I’d like to tackle the detailed API and > behaviour for this. Especially how _access work in conjunction with existing > access control features. > > My guiding principles so far are: > > 1. Make the API intuitive, things should work like they look like they should > work like. > 2. The default should never be that a resources is accidentally left > accessible to the public. > 3. This should work as a natural extension to the existing security features*. > > * I’d be up for reworking the whole lot, too, but that might be a better > discussion for > 4.0. > > > ## Database Creation and Default Behaviours > > Creating a database with _access features is, as mentioned before done via a > flag to PUT /database?access=true > > In a 3.0 world where this would land, we already agreed that databases should > be admin-only by default (instead of world read/writeable today). This is a > sensible default, but that leaves us with an _access enabled database that > can’t be used by anyone by server or db admins. Not very useful. > > To allow arbitrary users to use the db, I suggest we use the existing > _security system: i.e. if a user or a group a user belongs to is mentioned in > either `admins` or `members` inside of _security, they can proceed and create > documents on the db. This puts a second step burden on the application > developer, but it slots cleanly into the existing security mechanisms, and > doesn’t require special case handling. Alternatively, we could define that > _security isn’t available in _access enabled databases, but that’s something > I’d like to avoid if at all possible. > > In order to make it easy to specify that “everyone in _users” should be able > to use the db, I suggest we add a new role `_users` that is valid inside > _security, which means “everyone in /_users” (this only excludes server > admins which have full access anyway). > > * * * > > > ## Document Creation and Access Control > > Next, one of our non-admin users creates a doc. There are multiple options as > to how we store the _access information. > > 1. Automatically translate the userCtx.name of a doc creation (not an update) > into the first element of the _access array. E.g. user_a PUT /db/doc {"a":1} > creates this doc: {"a":1,"_access":["user_a"]}. This is a little bit > counter-intuitive. > > 2. We require that a user puts "_access":["user_a"] in themselves. This is an > explicit granting of access permissions on doc creation and I think is > preferable. > > This leaves the edge case of docs that have no _access member: so far I > thought those docs are admin-only, with maybe a db-wide option to swap the > default to public access, but I think given the explicitness of 2. we can do > better: require _access for all new doc creations in access-enabled > databases. A user can not create a new document without an _access field that > is an array that has at least one member. For public documents, we could > invent a new role _public, and admin-only docs could use the existing role > _admin. > > The one downside to this approach is that we won’t be able to replicate > existing databases into an access-enabled database without modifying all > documents. This might be a worthwhile trade-off, but we should make that > decision consciously and document it well. We could allow for a special case > where an _admin user can create docs that have no _access field, and those > docs are treated as having only the _admin role in _access. So at least we > could replicate all data in, but then require a manual step to update all > docs to say, migrate an existing db-per-user app, while not accidentally > exposing any docs to folks that shouldn’t read them. > > For the rest of cRUD, the existing document must store one of the RUD-ing > user’s name or role in its _access field. > > For both creations and updates, a user MUST supply at least one role they > belong to or their own username. > > * * * > > > ## _revs_diff > > /db/_revs_diff can answer the question of which revisions of a document do > NOT exist on a replication target: > http://docs.couchdb.org/en/stable/api/database/misc.html#db-revs-diff > > This would allow users to specify ids and rev(s) for docs they don’t have > access too (anymore), so the result schema should be expanded to handle id: > unauthorized or somesuch, something the replicator needs to know what to do > with, if it encounters it (say a user got removed from the _access list > inbetween the replicator opening _changes and requesting the doc). > > The _revs_diff implementation would have to altered to send an unauthorized > token for each doc the requesting userCtx has no access to. If we can re-use > some of our existing indexes, or any other performance optimisation, that’d > be great. I haven’t looked at that code at all, yet. > > An important side-effect of this is, once a user has been added to a doc’s > _access list, they get access to “the full history of the doc”, even before > they had access. Of course, in CouchDB this means only getting access to the > rev ids, and not the content, but since they are content-addressable hashes, > a user could brute-force themselves into revealing certain real values from > earlier incarnations of the doc. I’d rather not track _access per document > revision in perpetuity, so this is something we have to be very up-front > about. > > * * * > > > ## Partitioned Databases > > I mentioned partitioned databases in my previous mail, and I think it is > something we can document that end-users can opt into, but doesn’t require > any special casing on the _access proposal. That is, if users start prefixing > their doc ids with a user name or id and enable both _access and partitions, > then they get all the benefits of a partitioned database, and if they choose > not to, they don’t, but things keep working. There are enough use-cases to > warrant both behaviours. > > * * * > > > ## Scenarios that _access should help with. > > Overall, we developed _access to allow users to stop using the db-per-user > architecture, but once we have per-doc-access control, folks might start > using this for all manner of things. We should be clear about which scenarios > we support and which we don’t. > > > ### Scenario 1: db-per-user > > In this scenario, _access enabled databases, the only way to allow mutually > untrusting users to store data in a part of CouchDB that only they (and > admins) have access to was giving each user their own database. > > In an _access enabled database, users can CRUD/_changes/_all_docs/_revs_diff > their own docs knowing no other user (aside from admins) can access those > docs. > > This is the simplest scenario, as all we’d have to track the owner of a > document and produce by-access-id/seq indexes based on that owner. > > The current prototype implementation mostly reflects this stage. Not saying > this is what we should ship, but it is the easiest do implement and explain. > > Aside, I might be able to be persuaded to ship this as a 2.x feature, to help > those folks who don’t need anything else. > > > ### Scenario 2: db-per-user + Sharing > > The second we allow per doc auth, users will want to share those docs with > other users. That’s why we initially suggested the _access field be an array, > so other users and groups can be specified to have access. There are multiple > scenarios in this one alone: > > #### 2.1: The Todo List > > In this scenario, a user has a reasonable amount of ”personal data” that they > want to selectively share with one or more other users. > > #### 2.2: The Chat/Forum/Newsgroup > > In this scenario, a user wants to share any number of documents with a > reasonable number of groups. However, since we need to limit the number of > groups a user belongs to (currently 10, see below for details), this might > actually not be a great solution. Or folks couldn’t be in more than 10 chat > groups at a time. > > #### 2.3: The Corporate Hierarchy > > In this scenario, users want to share any number of docs with a reasonable > number of groups in a top-down/bottom-up fashion. Think CEO shares with > executives, execs share with divisions, divisions report up to their one > executive, etc. > > > ### 3: Multiple Apps > > The preceding scenarios all assume that a single application is responsible > for everything. However, once we allow mutually distrusting users into a > single database *and* make each per-user slice work (almost) like a full > standalone CouchDB database, what would stop users from using this for a > multi-homing feature, where different applications are used for each user in > the same database? > > I’ll be referring to these scenarios down the line. > > * * * > > > ## Design Docs > > ### Admin > > One of the downsides of db-per-user is managing design docs in the face of a > changing application, that is, how to distribute new design docs across 10s > of 1000+s of user dbs? It’s not impossible, but tedious. In all scenarios > above but scenario 3., we could simplify this significantly. Say an admin > creates a design doc, and gives all users in the db access to this design doc > (this could be with the _users role, or yet another new role _members, if we > need it), requesting the result of a view defined in that design doc will > produce an index that is powered by the requesting user’s by-access-seq index > section(s). > > N.B., this would require us to change a fundamental assumption when doing the > association between a design doc’s definition and index: normally, there is > only the `views` member that is hashed and that hash is used as the index’s > filename. Because there is only by-seq to power a view, that all works. But > now that we have an arbitrary set of sections on by-access-seq, any view > index built will have to take a user’s name and roles into account. When a > user leaves a group, or gains a group, all indexes for that user will no > longer be valid and need rebuilding. > > > ### User > > In any of the scenarios above, but especially 3., there could be legitimate > per-user design docs, so how should those be treated in an _access enabled > database? > > The significant fields in a design doc are `views`, `validate_doc_update` and > `filters` (I’ll skip over the deprecated _show, _list, and _update). > > The easiest to handle is a `filters`: if a user specifies a filter for a > _changes request or replication that lives in a design doc they don’t have > access to, they get an error, similar to if they specify a non-existent > design doc, just with `unauthorized` instead of `not_found`. > > Next `views` is also not very hard to imagine working: just like globally > defined views for that db, the index is built for each user based on the > user’s name and roles. > > More troubling are `validate_doc_update` functions: One, they are already > troubling in that they slow down any document updates. Two, if we now import > an existing db-per-user scenario where each user has their own design docs, > how should we apply validate_doc_update functions? 10s of 1000s of VDUs are > impractical to apply on each doc update, let alone just the management of > VDUs that are active on a database. One option would be to ignore VDUs if > they are not defined globally (say with a _members role). But especially in > scenario 3. this becomes problematic, but even without that specific > scenario, this violates the no surprises best practice. > > We could say: > > a) we don’t support scenario 3. > b) we find a complicated but efficient way to apply only those VDUs that are > defined in design docs the writing user has access to plus any global ones > (this would be neat but rather complicated and potentially still impractical > from a performance perspective for N users). > c) we could store all per-user design docs, but ignore them completely, VDUs, > views and filters. > > I think I currently fall on the side of not supporting scenario 3. and asking > folks who migrate db-per-user to de-duplicate design docs and keep them > per-app. I believe that is a good trade-off between the most common scenarios > for db-per-user while keeping the implementation manageable. Globally > accessible design docs would show up in a user’s changes feed and would > replicate down to say a PouchDB application which might be the exclusive user > of those design docs. > > In practice this would mean, a document that has an _id that starts with > _design/ will have to be produced by a database admin. Luckily, that’s > already the case. We should just make sure that folks don’t give db-admin > access to all users habitually. > > > ## Read and Write Access > > Speaking of validate_doc_update, it is used for two things: checking document > schema and doc update authorisation. > > Once we allow access to a document with an _access field, we need to decide > what kind of access this gives to a doc: read-only or read-write (I’m not > considering write-only because for anything but doc creations this is not > useful as you need access to the current _rev). > > However, when we look at implementing an application on top of our existing > API, it is already weird that read access can be controlled globally (or with > _access on a per doc level), but write access requires writing JavaScript > code. I think it would be a reasonable expectation for users to expect a > per-doc read/write permission granting. > > So we could have all of the above, but with two extra fields: _access_read > and _access_write, or _access: {read: [], write: []} or we overload user and > group names: _access: [user_a:read, user_b:write] (or any permutation > thereof). Overloading can cause trouble with naturally occurring characters > in group names. > > The former seems more explicit, but from an API perspective that’s a little > more awkward: remember that we currently have an arbitrary limit of 10 > members in a user’s role array, to avoid excessive fan out on > cluster-internal operations. Partitioned dbs could get away with more, more > easily however. If we allow the specification of access control in two lists, > and one of the lists implies membership in the other, we have a total limit > of 10 members across both arrays. Or we limit 5 + 5, but that seems > excessive, while 10 total seems weird, but doable. Anyway, good bikeshed. > > > * * * > > > So far. I think all of the problems outlined are solvable, if with a clear > definition of what use-cases we do not support with access. If you have more > scenarios than the ones I outlined, please add them and we can see if they > cause any additional trouble. > > Thanks for reading this far and I’m looking forward to your feedback. > > > Best, > Jan “_access” Lehnardt > — > > > > >> On 17. Feb 2019, at 15:25, Jan Lehnardt <j...@apache.org> wrote: >> >> Hi Everyone, >> >> I’m happy to share my work in progress attempt to implement the per-doc >> access control feature we discussed a good while ago: >> >> https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E >> >> <https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E> >> >> You can check out my branch here: >> >> https://github.com/apache/couchdb/compare/access?expand=1 >> <https://github.com/apache/couchdb/compare/access?expand=1> >> >> It is very much work in progress, but it is far enough along to warrant >> discussion. >> >> The main point of this branch is to show all the places that we would need >> to change to support the proposal. >> >> Things I’ve left for later: >> >> - currently only the first element in the _access array is used. Our and/or >> syntax can be added later. >> - building per-access views has not been implemented yet, couch_index would >> have to be taught about the new per-access-id index. >> - pretty HTTP error handling >> - tests except for a tiny shell script 😇 >> >> Implementation notes: >> >> You create a database with the _access feature turned on like so: PUT >> /db?access=true >> >> I started out with storing _access in the document body, as that would allow >> for a minimal change set, however, on doc updates, we try hard not to load >> the old doc body from the database, and forcing us to do so for EVERY doc >> update under _access seemed prohibitive, so I extended the #doc, #doc_info >> and #full_doc_info records with a new `access` attribute that is stored in >> both by-id and by-seq. I will need guidance on how extending these records >> impact multi-version cluster interop. And especially whether this is an >> acceptable approach. >> >> https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-904ab7473ff8ddd07ea44aca414e3a36 >> >> * * * >> >> The main addition is a new native query server called >> couch_access_native_proc, which implements two new indexes by-access-id and >> by-access-seq which do what you’d expect, pass in a userCtx and retrieve the >> equivalent of _all_docs or _changes, but only including those docs that >> match the username and roles in their _access property. The existing >> handlers for _all_docs and _changes have been augmented to use the new >> indexes instead of the default ones, unless the user is an admin. >> >> https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-fbb53323f07579be5e46ba63cb6701c4 >> >> * * * >> >> The rest of the diff is concerned with making document CRUD behave as you’d >> expect it. See this little demonstration for what things look like: >> >> https://gist.github.com/janl/b6d3f7502aa20b7b9ab9d9dcb8e92497 >> <https://gist.github.com/janl/b6d3f7502aa20b7b9ab9d9dcb8e92497> (I’m just >> noticing that there might be something wonky with DELETE, but you’ll get the >> gist #rimshot) >> >> * * * >> >> Open questions: >> >> - The aim of this is to get as close to regular CouchDB behaviour as >> possible. One thing that is new however which would require all apps to be >> changed is that for an _access enabled database to include an _access field >> in their docs (docs with no _access are admin-only for now). We might want >> to consider on new document writes to auto-insert the authenticated user’s >> name as the first element in the _access array, so existing apps “just work”. >> >> - Interplay with partitioned dbs: eschewing db-per-user is already a large >> boon if you have a lot of users, but making those per-user requests inside >> an _access enabled database efficient would be doubly nice, so why not use >> the username from the first question above and use that as the partition >> key? This would work nicely for natural users with their own docs that want >> to share them with others later, but I can easily imagine a pipelined use of >> CouchDB, where a “collector” user creates all new docs, an “analyser” takes >> them over and hand them to a “result” user for viewing. In that case, we’d >> violate the high-cardinality rule of partitions (have a lot of small ones), >> instead all docs go through all three users. I’d be okay with treating the >> later scenario as a minor use-case, but for that use-case, we should be able >> to disable auto-partitioning on db creation. >> >> - building access view indexes for docs that have frequent _access changes, >> lead to many orphaned view indexes, we should look at an auto-cleanup >> solution here (maybe keep 1-N indexes in case folks just swap back and >> forth). >> >> * * * >> >> I’ll leave this here for now, I’m sure there are a few more things to >> consider. >> >> I’d love to hear any and all feedback you might have. Especially if anything >> is unclear. >> >> Best >> Jan >> — > > -- > Professional Support for Apache CouchDB: > https://neighbourhood.ie/couchdb-support/ >