Re: [DISCUSS] Per-doc access control

Adam Kocoloski Tue, 26 Feb 2019 15:38:58 -0800

Mike,

If I’m reading you correctly you’re concerned about cross-domain 
authentication. A good problem and worth discussing, but I think it’s cleanly 
decoupled from the per-doc access control work, which is focused on 
*authorization*.


Garren,

A lot of the complexity in per-doc access controls comes down to ensuring the 
view system does not inadvertently allow the exposure of information that it 
should not expose. Did you have views in scope when you were thinking about how 
this would work as an external application?

Adam

> On Feb 26, 2019, at 1:36 PM, Michael Fair <[email protected]> wrote:
> 
> One thing I've always been concerned about when it comes to "user based"
> document access is how it interacts with cross domain replication, and
> mobile replication (or "user sliced" replication).
> 
> I believe this requires some kind of concept to a "Decentralized ID", or
> mapping user ids (roles) per database to system users (users) because the
> idea of a "user" and who the set of users are across administrative domains
> isn't likely to be the same set.
> 
> Picture the same database replicating with each other between two
> adminstrative domains; like say your domain and my domain ran two separate
> Couch instances and we shared a common database.  For example, a "product
> catalog", with per user access controls turned on so your organization can
> edit products and their descriptions and prices and mine could create
> purchase orders, and we both can update "tickets/issues/returns".
> 
> I think its super important to figure out how this case ought to behave
> when considering the user access design.  We will most likely have
> different user sets in our Couch server installations.
> 
> I suggest considering that all document databases only have the concept of
> role ids, and not user ids, internal to the database and that privileges be
> granted to those roles.  It's a requirement to create both a database role
> and a system user and link the two to get access to the database; the
> default role is called "public" which by default has read/write access to
> all documents and users are automatically linked/mapped to it.
> 
> It's also fine to name a database role the same as a system userid (e.g.
> the 'mfair' role); but that the database not have a concept of a "user"
> (authentication) only the "role" (authorization).  The system daemon
> handles autheticating users and there's a mapping of system users to
> database roles (like the 'mfair' system user could map to the 'mfair'
> database role).  The roles (and access privileges) would replicate with the
> database but the system users would not.
> 
> This isn't a completely baked thought yet, but I think it goes in the right
> direction.
> 
> Each independent server in a separate administrative domain would have its
> own set of users, each server then has to map those system users to
> database roles.  This should only have to be done once.
> 
> Couch servers could also choose to replicate their user databases too.
> Multiple Couch servers that replicate their user databases with each other
> is what I'm calling an "administrative domain" or simply "domain" for short.
> 
> A user can also be mapped to (i.e. hold) multiple roles in the same
> database simultaneously.
> 
> This model is very close to how MS SQL Server and other SQL databases
> handles users.
> 
> ..........
> Other ideas include:
> - Disclaim that replication with other domains simply breaks user access
> controls.  It's too complicated so its not supported.
> 
> - Replication can only be done within the context of a role that has
> read/write access to the database's documents (a new replication
> parameter).  Authenticating this login for automated replication might be
> tricky (remote user credentials stored in the replication database?)...
> 
> - putting all the users inside the database itself so that those ids
> replicate in addition to the contents but then every server has to be
> entrusted to authenticate every user in every database it shares with other
> domains amd the set of users becomes the superset of all users across all
> participating domains.  Bring able to reset a user's password in another
> domain because you have access to manipulating the database's users seems
> "wrong"...
> 
> - Using a decentralized p2p identity scheme like pki and using Couch itself
> as a distributed public key store.  This has the advantage that docs can be
> encrypted and decryption secrets protected by keypairs so remote databases
> can't automatically read contents they shouldn't...  It's obviously more
> complicated than simply trusting the remote administrators and human beings
> are notoriously bad at safely keeping secrets (they either end up sharing
> them or losing/forgetting them).
> 
> - Make an executive decision that CouchDB no longer has a primary use case
> for multimaster replication across administrative domains.  This feature is
> always what set Couch apart for me.  Replicating documents between
> decentralized administrative domains instead of only being a centralized
> document repository for a single organization.  I get that folks like IBM
> and other large single organization installations really don't care about
> replicating/sharing their data with third party organizations; and that
> sharing a multimaster distributed database across administrative domains is
> not as common as a single organization with their own large private
> repository and set of users; but I really like Couch specifically for the
> cross domain replication use case.  I think it's a medium term problem that
> people are looking for solutions to, and it's non-trivial to solve for.
> How can we share "records" securely 'between' many organizations instead of
> each organization trying to keep their own separate data instance copies in
> sync with each other?  I think Couch, and the Couch replication protocol,
> is a leading contender in addressing that challenge.
> 
> I bring this up now because I think whatever approach is used to adress the
> cross domain authorization issue will have a huge influence on the
> feature's design (alongside other factors).
> 
> Thanks,
> Mike
>> On Feb 26, 2019 3:39 AM, "Jan Lehnardt" <[email protected]> wrote:
>> 
>> Heya Garren,
>> 
>> thanks for having a look. From a code-organisation perspective, some of my
>> edits can easily live in a separate app vs. src/couch, that mostly an code
>> orga task which I’m happy to do. The epi suggestion surely helps with with
>> the handler overrides.
>> 
>> Some of the changes however have to be in core CouchDB, specifically the
>> storing of _access information on the various doc records, in order to
>> ensure efficient updates. That’s not something a fully external app can
>> manage. Whether it’s an extra field on those records or rather an entry
>> in the existing meta field is secondary, but this needs propagating into
>> by-id (and maybe also by-seq).
>> 
>> I’m not sure about your suggestion to listen to all access=true DBs’s
>> _changes feeds to generate the required indexes. That sounds like building
>> a new mini couch_mrview/couch_index rather than re-using that
>> infrastructure
>> with minimal edits.
>> 
>> As for the FDB option, going through the code this far helped me understand
>> all the building blocks required and I think adding this to FDB CouchDB
>> would maybe take a week total (i.e. be significantly easier), so I’m not
>> aiming to re-use much for that implementation other than the future test
>> suite.
>> 
>> That said, I’m very not married to my existing code, and I’d love to hear
>> any and all ways to simplify things.
>> 
>> Best
>> Jan
>> —
>> 
>> 
>>> On 26. Feb 2019, at 11:18, Garren Smith <[email protected]> wrote:
>>> 
>>> Hi Jan,
>>> 
>>> I've been giving this some thought and I wonder if we should take a step
>>> back and rethink how we do this. Instead of implementing this directly
>> into
>>> the CouchDB core code, it might be better to write this as an application
>>> similar to Dreyfus - Cloudant's search[1]. Instead of writing this code
>>> directly in the core CouchDB code rather we write this as another
>>> application. I'm hoping then that you wouldn't have to make huge
>>> modifications to the CouchDB codebase which should make this easier to
>> do.
>>> The application would override the _all_docs and _changes endpoints, and
>> if
>>> a user has enabled access=true for that database then you could then
>> return
>>> the _all_docs and _changes requests from your application. The epi http
>>> work is pretty fancy I think we could do some cool things around that to
>>> make this work well. The app would listen to the changes feeds of any
>>> database that has access=true and then implement the required index's for
>>> _all_docs and changes. I think we then would not have to create a custom
>>> indexer as we could build the indexes when new changes arrive.
>>> 
>>> I'm also hoping that another advantage of doing this as an app that
>> listens
>>> to the changes feed is that there should be minimal work to get this to
>>> work when we switch to fdb.
>>> 
>>> This is obviously just an idea I had and I thought I would share it, not
>> in
>>> an attempt to derail what you doing, but hopefully in an attempt to make
>>> sure we find the easiest and most effective way to get this done.
>>> 
>>> Cheers
>>> Garren
>>> 
>>> 
>>> [1] https://github.com/cloudant-labs/dreyfus
>>> 
>>>>> On Sun, Feb 17, 2019 at 4:25 PM Jan Lehnardt <[email protected]> wrote:
>>>> 
>>>> Hi Everyone,
>>>> 
>>>> I’m happy to share my work in progress attempt to implement the per-doc
>>>> access control feature we discussed a good while ago:
>>>> 
>>>> 
>>>> https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb
>> 509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E
>>>> <
>>>> https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb
>> 509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E
>>>> 
>>>> You can check out my branch here:
>>>> 
>>>> https://github.com/apache/couchdb/compare/access?expand=1 <
>>>> https://github.com/apache/couchdb/compare/access?expand=1>
>>>> 
>>>> It is very much work in progress, but it is far enough along to warrant
>>>> discussion.
>>>> 
>>>> The main point of this branch is to show all the places that we would
>> need
>>>> to change to support the proposal.
>>>> 
>>>> Things I’ve left for later:
>>>> 
>>>> - currently only the first element in the _access array is used. Our
>>>> and/or syntax can be added later.
>>>> - building per-access views has not been implemented yet, couch_index
>>>> would have to be taught about the new per-access-id index.
>>>> - pretty HTTP error handling
>>>> - tests except for a tiny shell script 😇
>>>> 
>>>> Implementation notes:
>>>> 
>>>> You create a database with the _access feature turned on like so:  PUT
>>>> /db?access=true
>>>> 
>>>> I started out with storing _access in the document body, as that would
>>>> allow for a minimal change set, however, on doc updates, we try hard
>> not to
>>>> load the old doc body from the database, and forcing us to do so for
>> EVERY
>>>> doc update under _access seemed prohibitive, so I extended the #doc,
>>>> #doc_info and #full_doc_info records with a new `access` attribute that
>> is
>>>> stored in both by-id and by-seq. I will need guidance on how extending
>>>> these records impact multi-version cluster interop. And especially
>> whether
>>>> this is an acceptable approach.
>>>> 
>>>> 
>>>> https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-
>> 904ab7473ff8ddd07ea44aca414e3a36
>>>> 
>>>> * * *
>>>> 
>>>> The main addition is a new native query server called
>>>> couch_access_native_proc, which implements two new indexes by-access-id
>> and
>>>> by-access-seq which do what you’d expect, pass in a userCtx and retrieve
>>>> the equivalent of _all_docs or _changes, but only including those docs
>> that
>>>> match the username and roles in their _access property. The existing
>>>> handlers for _all_docs and _changes have been augmented to use the new
>>>> indexes instead of the default ones, unless the user is an admin.
>>>> 
>>>> 
>>>> https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-
>> fbb53323f07579be5e46ba63cb6701c4
>>>> 
>>>> * * *
>>>> 
>>>> The rest of the diff is concerned with making document CRUD behave as
>>>> you’d expect it. See this little demonstration for what things look
>> like:
>>>> 
>>>> https://gist.github.com/janl/b6d3f7502aa20b7b9ab9d9dcb8e92497 <
>>>> https://gist.github.com/janl/b6d3f7502aa20b7b9ab9d9dcb8e92497> (I’m
>> just
>>>> noticing that there might be something wonky with DELETE, but you’ll get
>>>> the gist #rimshot)
>>>> 
>>>> * * *
>>>> 
>>>> Open questions:
>>>> 
>>>> - The aim of this is to get as close to regular CouchDB behaviour as
>>>> possible. One thing that is new however which would require all apps to
>> be
>>>> changed is that for an _access enabled database to include an _access
>> field
>>>> in their docs (docs with no _access are admin-only for now). We might
>> want
>>>> to consider on new document writes to auto-insert the authenticated
>> user’s
>>>> name as the first element in the _access array, so existing apps “just
>>>> work”.
>>>> 
>>>> - Interplay with partitioned dbs: eschewing db-per-user is already a
>> large
>>>> boon if you have a lot of users, but making those per-user requests
>> inside
>>>> an _access enabled database efficient would be doubly nice, so why not
>> use
>>>> the username from the first question above and use that as the partition
>>>> key? This would work nicely for natural users with their own docs that
>> want
>>>> to share them with others later, but I can easily imagine a pipelined
>> use
>>>> of CouchDB, where a “collector” user creates all new docs, an “analyser”
>>>> takes them over and hand them to a “result” user for viewing. In that
>> case,
>>>> we’d violate the high-cardinality rule of partitions (have a lot of
>> small
>>>> ones), instead all docs go through all three users. I’d be okay with
>>>> treating the later scenario as a minor use-case, but for that use-case,
>> we
>>>> should be able to disable auto-partitioning on db creation.
>>>> 
>>>> - building access view indexes for docs that have frequent _access
>>>> changes, lead to many orphaned view indexes, we should look at an
>>>> auto-cleanup solution here (maybe keep 1-N indexes in case folks just
>> swap
>>>> back and forth).
>>>> 
>>>> * * *
>>>> 
>>>> I’ll leave this here for now, I’m sure there are a few more things to
>>>> consider.
>>>> 
>>>> I’d love to hear any and all feedback you might have. Especially if
>>>> anything is unclear.
>>>> 
>>>> Best
>>>> Jan
>>>> —
>> 
>> --
>> Professional Support for Apache CouchDB:
>> https://neighbourhood.ie/couchdb-support/
>> 
>>

Re: [DISCUSS] Per-doc access control

Reply via email to