[DISCUSS] Per-doc access control

Jan Lehnardt Sun, 17 Feb 2019 06:26:35 -0800

Hi Everyone,

I’m happy to share my work in progress attempt to implement the per-doc access 
control feature we discussed a good while ago:


https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E
 
<https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E>

You can check out my branch here:

https://github.com/apache/couchdb/compare/access?expand=1 
<https://github.com/apache/couchdb/compare/access?expand=1>

It is very much work in progress, but it is far enough along to warrant 
discussion.

The main point of this branch is to show all the places that we would need to 
change to support the proposal.

Things I’ve left for later:

- currently only the first element in the _access array is used. Our and/or 
syntax can be added later.
- building per-access views has not been implemented yet, couch_index would 
have to be taught about the new per-access-id index.
- pretty HTTP error handling
- tests except for a tiny shell script 😇

Implementation notes:

You create a database with the _access feature turned on like so:  PUT 
/db?access=true

I started out with storing _access in the document body, as that would allow 
for a minimal change set, however, on doc updates, we try hard not to load the 
old doc body from the database, and forcing us to do so for EVERY doc update 
under _access seemed prohibitive, so I extended the #doc, #doc_info and 
#full_doc_info records with a new `access` attribute that is stored in both 
by-id and by-seq. I will need guidance on how extending these records impact 
multi-version cluster interop. And especially whether this is an acceptable 
approach.

https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-904ab7473ff8ddd07ea44aca414e3a36

* * *

The main addition is a new native query server called couch_access_native_proc, 
which implements two new indexes by-access-id and by-access-seq which do what 
you’d expect, pass in a userCtx and retrieve the equivalent of _all_docs or 
_changes, but only including those docs that match the username and roles in 
their _access property. The existing handlers for _all_docs and _changes have 
been augmented to use the new indexes instead of the default ones, unless the 
user is an admin.

https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-fbb53323f07579be5e46ba63cb6701c4

 * * *

The rest of the diff is concerned with making document CRUD behave as you’d 
expect it. See this little demonstration for what things look like:

https://gist.github.com/janl/b6d3f7502aa20b7b9ab9d9dcb8e92497 
<https://gist.github.com/janl/b6d3f7502aa20b7b9ab9d9dcb8e92497> (I’m just 
noticing that there might be something wonky with DELETE, but you’ll get the 
gist #rimshot)

* * *

Open questions:

- The aim of this is to get as close to regular CouchDB behaviour as possible. 
One thing that is new however which would require all apps to be changed is 
that for an _access enabled database to include an _access field in their docs 
(docs with no _access are admin-only for now). We might want to consider on new 
document writes to auto-insert the authenticated user’s name as the first 
element in the _access array, so existing apps “just work”.

- Interplay with partitioned dbs: eschewing db-per-user is already a large boon 
if you have a lot of users, but making those per-user requests inside an 
_access enabled database efficient would be doubly nice, so why not use the 
username from the first question above and use that as the partition key? This 
would work nicely for natural users with their own docs that want to share them 
with others later, but I can easily imagine a pipelined use of CouchDB, where a 
“collector” user creates all new docs, an “analyser” takes them over and hand 
them to a “result” user for viewing. In that case, we’d violate the 
high-cardinality rule of partitions (have a lot of small ones), instead all 
docs go through all three users. I’d be okay with treating the later scenario 
as a minor use-case, but for that use-case, we should be able to disable 
auto-partitioning on db creation.

- building access view indexes for docs that have frequent _access changes, 
lead to many orphaned view indexes, we should look at an auto-cleanup solution 
here (maybe keep 1-N indexes in case folks just swap back and forth).

* * *

I’ll leave this here for now, I’m sure there are a few more things to consider.

I’d love to hear any and all feedback you might have. Especially if anything is 
unclear.

Best
Jan
—

[DISCUSS] Per-doc access control

Reply via email to