Heya Garren,

thanks for having a look. From a code-organisation perspective, some of my
edits can easily live in a separate app vs. src/couch, that mostly an code
orga task which I’m happy to do. The epi suggestion surely helps with with
the handler overrides.

Some of the changes however have to be in core CouchDB, specifically the
storing of _access information on the various doc records, in order to
ensure efficient updates. That’s not something a fully external app can
manage. Whether it’s an extra field on those records or rather an entry
in the existing meta field is secondary, but this needs propagating into
by-id (and maybe also by-seq).

I’m not sure about your suggestion to listen to all access=true DBs’s
_changes feeds to generate the required indexes. That sounds like building
a new mini couch_mrview/couch_index rather than re-using that infrastructure
with minimal edits.

As for the FDB option, going through the code this far helped me understand
all the building blocks required and I think adding this to FDB CouchDB
would maybe take a week total (i.e. be significantly easier), so I’m not
aiming to re-use much for that implementation other than the future test
suite.

That said, I’m very not married to my existing code, and I’d love to hear
any and all ways to simplify things.

Best
Jan
—


> On 26. Feb 2019, at 11:18, Garren Smith <gar...@apache.org> wrote:
> 
> Hi Jan,
> 
> I've been giving this some thought and I wonder if we should take a step
> back and rethink how we do this. Instead of implementing this directly into
> the CouchDB core code, it might be better to write this as an application
> similar to Dreyfus - Cloudant's search[1]. Instead of writing this code
> directly in the core CouchDB code rather we write this as another
> application. I'm hoping then that you wouldn't have to make huge
> modifications to the CouchDB codebase which should make this easier to do.
> The application would override the _all_docs and _changes endpoints, and if
> a user has enabled access=true for that database then you could then return
> the _all_docs and _changes requests from your application. The epi http
> work is pretty fancy I think we could do some cool things around that to
> make this work well. The app would listen to the changes feeds of any
> database that has access=true and then implement the required index's for
> _all_docs and changes. I think we then would not have to create a custom
> indexer as we could build the indexes when new changes arrive.
> 
> I'm also hoping that another advantage of doing this as an app that listens
> to the changes feed is that there should be minimal work to get this to
> work when we switch to fdb.
> 
> This is obviously just an idea I had and I thought I would share it, not in
> an attempt to derail what you doing, but hopefully in an attempt to make
> sure we find the easiest and most effective way to get this done.
> 
> Cheers
> Garren
> 
> 
> [1] https://github.com/cloudant-labs/dreyfus
> 
> On Sun, Feb 17, 2019 at 4:25 PM Jan Lehnardt <j...@apache.org> wrote:
> 
>> Hi Everyone,
>> 
>> I’m happy to share my work in progress attempt to implement the per-doc
>> access control feature we discussed a good while ago:
>> 
>> 
>> https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E
>> <
>> https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E
>>> 
>> 
>> You can check out my branch here:
>> 
>> https://github.com/apache/couchdb/compare/access?expand=1 <
>> https://github.com/apache/couchdb/compare/access?expand=1>
>> 
>> It is very much work in progress, but it is far enough along to warrant
>> discussion.
>> 
>> The main point of this branch is to show all the places that we would need
>> to change to support the proposal.
>> 
>> Things I’ve left for later:
>> 
>> - currently only the first element in the _access array is used. Our
>> and/or syntax can be added later.
>> - building per-access views has not been implemented yet, couch_index
>> would have to be taught about the new per-access-id index.
>> - pretty HTTP error handling
>> - tests except for a tiny shell script 😇
>> 
>> Implementation notes:
>> 
>> You create a database with the _access feature turned on like so:  PUT
>> /db?access=true
>> 
>> I started out with storing _access in the document body, as that would
>> allow for a minimal change set, however, on doc updates, we try hard not to
>> load the old doc body from the database, and forcing us to do so for EVERY
>> doc update under _access seemed prohibitive, so I extended the #doc,
>> #doc_info and #full_doc_info records with a new `access` attribute that is
>> stored in both by-id and by-seq. I will need guidance on how extending
>> these records impact multi-version cluster interop. And especially whether
>> this is an acceptable approach.
>> 
>> 
>> https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-904ab7473ff8ddd07ea44aca414e3a36
>> 
>> * * *
>> 
>> The main addition is a new native query server called
>> couch_access_native_proc, which implements two new indexes by-access-id and
>> by-access-seq which do what you’d expect, pass in a userCtx and retrieve
>> the equivalent of _all_docs or _changes, but only including those docs that
>> match the username and roles in their _access property. The existing
>> handlers for _all_docs and _changes have been augmented to use the new
>> indexes instead of the default ones, unless the user is an admin.
>> 
>> 
>> https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-fbb53323f07579be5e46ba63cb6701c4
>> 
>> * * *
>> 
>> The rest of the diff is concerned with making document CRUD behave as
>> you’d expect it. See this little demonstration for what things look like:
>> 
>> https://gist.github.com/janl/b6d3f7502aa20b7b9ab9d9dcb8e92497 <
>> https://gist.github.com/janl/b6d3f7502aa20b7b9ab9d9dcb8e92497> (I’m just
>> noticing that there might be something wonky with DELETE, but you’ll get
>> the gist #rimshot)
>> 
>> * * *
>> 
>> Open questions:
>> 
>> - The aim of this is to get as close to regular CouchDB behaviour as
>> possible. One thing that is new however which would require all apps to be
>> changed is that for an _access enabled database to include an _access field
>> in their docs (docs with no _access are admin-only for now). We might want
>> to consider on new document writes to auto-insert the authenticated user’s
>> name as the first element in the _access array, so existing apps “just
>> work”.
>> 
>> - Interplay with partitioned dbs: eschewing db-per-user is already a large
>> boon if you have a lot of users, but making those per-user requests inside
>> an _access enabled database efficient would be doubly nice, so why not use
>> the username from the first question above and use that as the partition
>> key? This would work nicely for natural users with their own docs that want
>> to share them with others later, but I can easily imagine a pipelined use
>> of CouchDB, where a “collector” user creates all new docs, an “analyser”
>> takes them over and hand them to a “result” user for viewing. In that case,
>> we’d violate the high-cardinality rule of partitions (have a lot of small
>> ones), instead all docs go through all three users. I’d be okay with
>> treating the later scenario as a minor use-case, but for that use-case, we
>> should be able to disable auto-partitioning on db creation.
>> 
>> - building access view indexes for docs that have frequent _access
>> changes, lead to many orphaned view indexes, we should look at an
>> auto-cleanup solution here (maybe keep 1-N indexes in case folks just swap
>> back and forth).
>> 
>> * * *
>> 
>> I’ll leave this here for now, I’m sure there are a few more things to
>> consider.
>> 
>> I’d love to hear any and all feedback you might have. Especially if
>> anything is unclear.
>> 
>> Best
>> Jan
>> —

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

Reply via email to