Re: [DISCUSS] Per-doc access control

Jan Lehnardt Sun, 10 Mar 2019 09:03:56 -0700

One addition, the slotting in of _access into existing security mechanisms is 
as follows:


1. Check if a user is in _security
2. If yes, check it user is in _access (modulo read/write)
3. If yes, does the doc update pass any globally defined VDUs
4. If yes, operation can proceed.

Cheers
Jan
—

> On 10. Mar 2019, at 15:51, Jan Lehnardt <j...@apache.org> wrote:
> 
> Hey all,
> 
> after mulling this over some more, I’d like to tackle the detailed API and 
> behaviour for this. Especially how _access work in conjunction with existing 
> access control features.
> 
> My guiding principles so far are:
> 
> 1. Make the API intuitive, things should work like they look like they should 
> work like.
> 2. The default should never be that a resources is accidentally left 
> accessible to the public.
> 3. This should work as a natural extension to the existing security features*.
> 
> * I’d be up for reworking the whole lot, too, but that might be a better 
> discussion for > 4.0.
> 
> 
> ## Database Creation and Default Behaviours
> 
> Creating a database with _access features is, as mentioned before done via a 
> flag to PUT /database?access=true
> 
> In a 3.0 world where this would land, we already agreed that databases should 
> be admin-only by default (instead of world read/writeable today). This is a 
> sensible default, but that leaves us with an _access enabled database that 
> can’t be used by anyone by server or db admins. Not very useful.
> 
> To allow arbitrary users to use the db, I suggest we use the existing 
> _security system: i.e. if a user or a group a user belongs to is mentioned in 
> either `admins` or `members` inside of _security, they can proceed and create 
> documents on the db. This puts a second step burden on the application 
> developer, but it slots cleanly into the existing security mechanisms, and 
> doesn’t require special case handling. Alternatively, we could define that 
> _security isn’t available in _access enabled databases, but that’s something 
> I’d like to avoid if at all possible.
> 
> In order to make it easy to specify that “everyone in _users” should be able 
> to use the db, I suggest we add a new role `_users` that is valid inside 
> _security, which means “everyone in /_users” (this only excludes server 
> admins which have full access anyway).
> 
> * * *
> 
> 
> ## Document Creation and Access Control
> 
> Next, one of our non-admin users creates a doc. There are multiple options as 
> to how we store the _access information.
> 
> 1. Automatically translate the userCtx.name of a doc creation (not an update) 
> into the first element of the _access array. E.g. user_a PUT /db/doc {"a":1} 
> creates this doc: {"a":1,"_access":["user_a"]}. This is a little bit 
> counter-intuitive.
> 
> 2. We require that a user puts "_access":["user_a"] in themselves. This is an 
> explicit granting of access permissions on doc creation and I think is 
> preferable.
> 
> This leaves the edge case of docs that have no _access member: so far I 
> thought those docs are admin-only, with maybe a db-wide option to swap the 
> default to public access, but I think given the explicitness of 2. we can do 
> better: require _access for all new doc creations in access-enabled 
> databases. A user can not create a new document without an _access field that 
> is an array that has at least one member. For public documents, we could 
> invent a new role _public, and admin-only docs could use the existing role 
> _admin.
> 
> The one downside to this approach is that we won’t be able to replicate 
> existing databases into an access-enabled database without modifying all 
> documents. This might be a worthwhile trade-off, but we should make that 
> decision consciously and document it well. We could allow for a special case 
> where an _admin user can create docs that have no _access field, and those 
> docs are treated as having only the _admin role in _access. So at least we 
> could replicate all data in, but then require a manual step to update all 
> docs to say, migrate an existing db-per-user app, while not accidentally 
> exposing any docs to folks that shouldn’t read them.
> 
> For the rest of cRUD, the existing document must store one of the RUD-ing 
> user’s name or role in its _access field.
> 
> For both creations and updates, a user MUST supply at least one role they 
> belong to or their own username.
> 
> * * *
> 
> 
> ## _revs_diff
> 
> /db/_revs_diff can answer the question of which revisions of a document do 
> NOT exist on a replication target: 
> http://docs.couchdb.org/en/stable/api/database/misc.html#db-revs-diff
> 
> This would allow users to specify ids and rev(s) for docs they don’t have 
> access too (anymore), so the result schema should be expanded to handle id: 
> unauthorized or somesuch, something the replicator needs to know what to do 
> with, if it encounters it (say a user got removed from the _access list 
> inbetween the replicator opening _changes and requesting the doc).
> 
> The _revs_diff implementation would have to altered to send an unauthorized 
> token for each doc the requesting userCtx has no access to. If we can re-use 
> some of our existing indexes, or any other performance optimisation, that’d 
> be great. I haven’t looked at that code at all, yet.
> 
> An important side-effect of this is, once a user has been added to a doc’s 
> _access list, they get access to “the full history of the doc”, even before 
> they had access. Of course, in CouchDB this means only getting access to the 
> rev ids, and not the content, but since they are content-addressable hashes, 
> a user could brute-force themselves into revealing certain real values from 
> earlier incarnations of the doc. I’d rather not track _access per document 
> revision in perpetuity, so this is something we have to be very up-front 
> about.
> 
> * * *
> 
> 
> ## Partitioned Databases
> 
> I mentioned partitioned databases in my previous mail, and I think it is 
> something we can document that end-users can opt into, but doesn’t require 
> any special casing on the _access proposal. That is, if users start prefixing 
> their doc ids with a user name or id and enable both _access and partitions, 
> then they get all the benefits of a partitioned database, and if they choose 
> not to, they don’t, but things keep working. There are enough use-cases to 
> warrant both behaviours.
> 
> * * *
> 
> 
> ## Scenarios that _access should help with.
> 
> Overall, we developed _access to allow users to stop using the db-per-user 
> architecture, but once we have per-doc-access control, folks might start 
> using this for all manner of things. We should be clear about which scenarios 
> we support and which we don’t.
> 
> 
> ### Scenario 1: db-per-user
> 
> In this scenario, _access enabled databases, the only way to allow mutually 
> untrusting users to store data in a part of CouchDB that only they (and 
> admins) have access to was giving each user their own database.
> 
> In an _access enabled database, users can CRUD/_changes/_all_docs/_revs_diff 
> their own docs knowing no other user (aside from admins) can access those 
> docs.
> 
> This is the simplest scenario, as all we’d have to track the owner of a 
> document and produce by-access-id/seq indexes based on that owner.
> 
> The current prototype implementation mostly reflects this stage. Not saying 
> this is what we should ship, but it is the easiest do implement and explain.
> 
> Aside, I might be able to be persuaded to ship this as a 2.x feature, to help 
> those folks who don’t need anything else.
> 
> 
> ### Scenario 2: db-per-user + Sharing
> 
> The second we allow per doc auth, users will want to share those docs with 
> other users. That’s why we initially suggested the _access field be an array, 
> so other users and groups can be specified to have access. There are multiple 
> scenarios in this one alone:
> 
> #### 2.1: The Todo List
> 
> In this scenario, a user has a reasonable amount of ”personal data” that they 
> want to selectively share with one or more other users.
> 
> #### 2.2: The Chat/Forum/Newsgroup
> 
> In this scenario, a user wants to share any number of documents with a 
> reasonable number of groups. However, since we need to limit the number of 
> groups a user belongs to (currently 10, see below for details), this might 
> actually not be a great solution. Or folks couldn’t be in more than 10 chat 
> groups at a time.
> 
> #### 2.3: The Corporate Hierarchy
> 
> In this scenario, users want to share any number of docs with a reasonable 
> number of groups in a top-down/bottom-up fashion. Think CEO shares with 
> executives, execs share with divisions, divisions report up to their one 
> executive, etc.
> 
> 
> ### 3: Multiple Apps
> 
> The preceding scenarios all assume that a single application is responsible 
> for everything. However, once we allow mutually distrusting users into a 
> single database *and* make each per-user slice work (almost) like a full 
> standalone CouchDB database, what would stop users from using this for a 
> multi-homing feature, where different applications are used for each user in 
> the same database?
> 
> I’ll be referring to these scenarios down the line.
> 
> * * *
> 
> 
> ## Design Docs
> 
> ### Admin
> 
> One of the downsides of db-per-user is managing design docs in the face of a 
> changing application, that is, how to distribute new design docs across 10s 
> of 1000+s of user dbs? It’s not impossible, but tedious. In all scenarios 
> above but scenario 3., we could simplify this significantly. Say an admin 
> creates a design doc, and gives all users in the db access to this design doc 
> (this could be with the _users role, or yet another new role _members, if we 
> need it), requesting the result of a view defined in that design doc will 
> produce an index that is powered by the requesting user’s by-access-seq index 
> section(s).
> 
> N.B., this would require us to change a fundamental assumption when doing the 
> association between a design doc’s definition and index: normally, there is 
> only the `views` member that is hashed and that hash is used as the index’s 
> filename. Because there is only by-seq to power a view, that all works. But 
> now that we have an arbitrary set of sections on by-access-seq, any view 
> index built will have to take a user’s name and roles into account. When a 
> user leaves a group, or gains a group, all indexes for that user will no 
> longer be valid and need rebuilding.
> 
> 
> ### User
> 
> In any of the scenarios above, but especially 3., there could be legitimate 
> per-user design docs, so how should those be treated in an _access enabled 
> database?
> 
> The significant fields in a design doc are `views`, `validate_doc_update` and 
> `filters` (I’ll skip over the deprecated _show, _list, and _update).
> 
> The easiest to handle is a `filters`: if a user specifies a filter for a 
> _changes request or replication that lives in a design doc they don’t have 
> access to, they get an error, similar to if they specify a non-existent 
> design doc, just with `unauthorized` instead of `not_found`.
> 
> Next `views` is also not very hard to imagine working: just like globally 
> defined views for that db, the index is built for each user based on the 
> user’s name and roles.
> 
> More troubling are `validate_doc_update` functions: One, they are already 
> troubling in that they slow down any document updates. Two, if we now import 
> an existing db-per-user scenario where each user has their own design docs, 
> how should we apply validate_doc_update functions? 10s of 1000s of VDUs are 
> impractical to apply on each doc update, let alone just the management of 
> VDUs that are active on a database. One option would be to ignore VDUs if 
> they are not defined globally (say with a _members role). But especially in 
> scenario 3. this becomes problematic, but even without that specific 
> scenario, this violates the no surprises best practice.
> 
> We could say:
> 
> a) we don’t support scenario 3.
> b) we find a complicated but efficient way to apply only those VDUs that are 
> defined in design docs the writing user has access to plus any global ones 
> (this would be neat but rather complicated and potentially still impractical 
> from a performance perspective for N users).
> c) we could store all per-user design docs, but ignore them completely, VDUs, 
> views and filters.
> 
> I think I currently fall on the side of not supporting scenario 3. and asking 
> folks who migrate db-per-user to de-duplicate design docs and keep them 
> per-app. I believe that is a good trade-off between the most common scenarios 
> for db-per-user while keeping the implementation manageable. Globally 
> accessible design docs would show up in a user’s changes feed and would 
> replicate down to say a PouchDB application which might be the exclusive user 
> of those design docs.
> 
> In practice this would mean, a document that has an _id that starts with 
> _design/ will have to be produced by a database admin. Luckily, that’s 
> already the case. We should just make sure that folks don’t give db-admin 
> access to all users habitually.
> 
> 
> ## Read and Write Access
> 
> Speaking of validate_doc_update, it is used for two things: checking document 
> schema and doc update authorisation.
> 
> Once we allow access to a document with an _access field, we need to decide 
> what kind of access this gives to a doc: read-only or read-write (I’m not 
> considering write-only because for anything but doc creations this is not 
> useful as you need access to the current _rev).
> 
> However, when we look at implementing an application on top of our existing 
> API, it is already weird that read access can be controlled globally (or with 
> _access on a per doc level), but write access requires writing JavaScript 
> code. I think it would be a reasonable expectation for users to expect a 
> per-doc read/write permission granting.
> 
> So we could have all of the above, but with two extra fields: _access_read 
> and _access_write, or _access: {read: [], write: []} or we overload user and 
> group names: _access: [user_a:read, user_b:write] (or any permutation 
> thereof). Overloading can cause trouble with naturally occurring characters 
> in group names.
> 
> The former seems more explicit, but from an API perspective that’s a little 
> more awkward: remember that we currently have an arbitrary limit of 10 
> members in a user’s role array, to avoid excessive fan out on 
> cluster-internal operations. Partitioned dbs could get away with more, more 
> easily however. If we allow the specification of access control in two lists, 
> and one of the lists implies membership in the other, we have a total limit 
> of 10 members across both arrays. Or we limit 5 + 5, but that seems 
> excessive, while 10 total seems weird, but doable. Anyway, good bikeshed.
> 
> 
> * * * 
> 
> 
> So far. I think all of the problems outlined are solvable, if with a clear 
> definition of what use-cases we do not support with access. If you have more 
> scenarios than the ones I outlined, please add them and we can see if they 
> cause any additional trouble.
> 
> Thanks for reading this far and I’m looking forward to your feedback.
> 
> 
> Best,
> Jan “_access” Lehnardt
> —
> 
> 
> 
> 
>> On 17. Feb 2019, at 15:25, Jan Lehnardt <j...@apache.org> wrote:
>> 
>> Hi Everyone,
>> 
>> I’m happy to share my work in progress attempt to implement the per-doc 
>> access control feature we discussed a good while ago:
>> 
>> https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E
>>  
>> <https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E>
>> 
>> You can check out my branch here:
>> 
>> https://github.com/apache/couchdb/compare/access?expand=1 
>> <https://github.com/apache/couchdb/compare/access?expand=1>
>> 
>> It is very much work in progress, but it is far enough along to warrant 
>> discussion.
>> 
>> The main point of this branch is to show all the places that we would need 
>> to change to support the proposal.
>> 
>> Things I’ve left for later:
>> 
>> - currently only the first element in the _access array is used. Our and/or 
>> syntax can be added later.
>> - building per-access views has not been implemented yet, couch_index would 
>> have to be taught about the new per-access-id index.
>> - pretty HTTP error handling
>> - tests except for a tiny shell script 😇
>> 
>> Implementation notes:
>> 
>> You create a database with the _access feature turned on like so:  PUT 
>> /db?access=true
>> 
>> I started out with storing _access in the document body, as that would allow 
>> for a minimal change set, however, on doc updates, we try hard not to load 
>> the old doc body from the database, and forcing us to do so for EVERY doc 
>> update under _access seemed prohibitive, so I extended the #doc, #doc_info 
>> and #full_doc_info records with a new `access` attribute that is stored in 
>> both by-id and by-seq. I will need guidance on how extending these records 
>> impact multi-version cluster interop. And especially whether this is an 
>> acceptable approach.
>> 
>> https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-904ab7473ff8ddd07ea44aca414e3a36
>> 
>> * * *
>> 
>> The main addition is a new native query server called 
>> couch_access_native_proc, which implements two new indexes by-access-id and 
>> by-access-seq which do what you’d expect, pass in a userCtx and retrieve the 
>> equivalent of _all_docs or _changes, but only including those docs that 
>> match the username and roles in their _access property. The existing 
>> handlers for _all_docs and _changes have been augmented to use the new 
>> indexes instead of the default ones, unless the user is an admin.
>> 
>> https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-fbb53323f07579be5e46ba63cb6701c4
>> 
>> * * *
>> 
>> The rest of the diff is concerned with making document CRUD behave as you’d 
>> expect it. See this little demonstration for what things look like:
>> 
>> https://gist.github.com/janl/b6d3f7502aa20b7b9ab9d9dcb8e92497 
>> <https://gist.github.com/janl/b6d3f7502aa20b7b9ab9d9dcb8e92497> (I’m just 
>> noticing that there might be something wonky with DELETE, but you’ll get the 
>> gist #rimshot)
>> 
>> * * *
>> 
>> Open questions:
>> 
>> - The aim of this is to get as close to regular CouchDB behaviour as 
>> possible. One thing that is new however which would require all apps to be 
>> changed is that for an _access enabled database to include an _access field 
>> in their docs (docs with no _access are admin-only for now). We might want 
>> to consider on new document writes to auto-insert the authenticated user’s 
>> name as the first element in the _access array, so existing apps “just work”.
>> 
>> - Interplay with partitioned dbs: eschewing db-per-user is already a large 
>> boon if you have a lot of users, but making those per-user requests inside 
>> an _access enabled database efficient would be doubly nice, so why not use 
>> the username from the first question above and use that as the partition 
>> key? This would work nicely for natural users with their own docs that want 
>> to share them with others later, but I can easily imagine a pipelined use of 
>> CouchDB, where a “collector” user creates all new docs, an “analyser” takes 
>> them over and hand them to a “result” user for viewing. In that case, we’d 
>> violate the high-cardinality rule of partitions (have a lot of small ones), 
>> instead all docs go through all three users. I’d be okay with treating the 
>> later scenario as a minor use-case, but for that use-case, we should be able 
>> to disable auto-partitioning on db creation.
>> 
>> - building access view indexes for docs that have frequent _access changes, 
>> lead to many orphaned view indexes, we should look at an auto-cleanup 
>> solution here (maybe keep 1-N indexes in case folks just swap back and 
>> forth).
>> 
>> * * *
>> 
>> I’ll leave this here for now, I’m sure there are a few more things to 
>> consider.
>> 
>> I’d love to hear any and all feedback you might have. Especially if anything 
>> is unclear.
>> 
>> Best
>> Jan
>> —
> 
> -- 
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
>

Re: [DISCUSS] Per-doc access control

Reply via email to