Hi Everyone, I’m happy to share my work in progress attempt to implement the per-doc access control feature we discussed a good while ago:
https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E <https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E> You can check out my branch here: https://github.com/apache/couchdb/compare/access?expand=1 <https://github.com/apache/couchdb/compare/access?expand=1> It is very much work in progress, but it is far enough along to warrant discussion. The main point of this branch is to show all the places that we would need to change to support the proposal. Things I’ve left for later: - currently only the first element in the _access array is used. Our and/or syntax can be added later. - building per-access views has not been implemented yet, couch_index would have to be taught about the new per-access-id index. - pretty HTTP error handling - tests except for a tiny shell script 😇 Implementation notes: You create a database with the _access feature turned on like so: PUT /db?access=true I started out with storing _access in the document body, as that would allow for a minimal change set, however, on doc updates, we try hard not to load the old doc body from the database, and forcing us to do so for EVERY doc update under _access seemed prohibitive, so I extended the #doc, #doc_info and #full_doc_info records with a new `access` attribute that is stored in both by-id and by-seq. I will need guidance on how extending these records impact multi-version cluster interop. And especially whether this is an acceptable approach. https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-904ab7473ff8ddd07ea44aca414e3a36 * * * The main addition is a new native query server called couch_access_native_proc, which implements two new indexes by-access-id and by-access-seq which do what you’d expect, pass in a userCtx and retrieve the equivalent of _all_docs or _changes, but only including those docs that match the username and roles in their _access property. The existing handlers for _all_docs and _changes have been augmented to use the new indexes instead of the default ones, unless the user is an admin. https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-fbb53323f07579be5e46ba63cb6701c4 * * * The rest of the diff is concerned with making document CRUD behave as you’d expect it. See this little demonstration for what things look like: https://gist.github.com/janl/b6d3f7502aa20b7b9ab9d9dcb8e92497 <https://gist.github.com/janl/b6d3f7502aa20b7b9ab9d9dcb8e92497> (I’m just noticing that there might be something wonky with DELETE, but you’ll get the gist #rimshot) * * * Open questions: - The aim of this is to get as close to regular CouchDB behaviour as possible. One thing that is new however which would require all apps to be changed is that for an _access enabled database to include an _access field in their docs (docs with no _access are admin-only for now). We might want to consider on new document writes to auto-insert the authenticated user’s name as the first element in the _access array, so existing apps “just work”. - Interplay with partitioned dbs: eschewing db-per-user is already a large boon if you have a lot of users, but making those per-user requests inside an _access enabled database efficient would be doubly nice, so why not use the username from the first question above and use that as the partition key? This would work nicely for natural users with their own docs that want to share them with others later, but I can easily imagine a pipelined use of CouchDB, where a “collector” user creates all new docs, an “analyser” takes them over and hand them to a “result” user for viewing. In that case, we’d violate the high-cardinality rule of partitions (have a lot of small ones), instead all docs go through all three users. I’d be okay with treating the later scenario as a minor use-case, but for that use-case, we should be able to disable auto-partitioning on db creation. - building access view indexes for docs that have frequent _access changes, lead to many orphaned view indexes, we should look at an auto-cleanup solution here (maybe keep 1-N indexes in case folks just swap back and forth). * * * I’ll leave this here for now, I’m sure there are a few more things to consider. I’d love to hear any and all feedback you might have. Especially if anything is unclear. Best Jan —