Re: [MarkLogic Dev General] Document Level Authorization (Roles and Users)

Geert Josten Wed, 11 Dec 2013 10:58:05 -0800

Anyone from LDS listening to this thread? I recall they were doing lots of
user and role creation on the fly, for many many users. And I’m sure
markmail will be able to pull out several threads on similar topics,
including some about the LDS case..




Cheers,

Geert



*Van:* [email protected] [mailto:
[email protected]] *Namens *David Lee
*Verzonden:* woensdag 11 december 2013 18:58
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* Re: [MarkLogic Dev General] Document Level Authorization
(Roles and Users)



Great bits of info.

I will add a tidbit more ... but would love the conversation to continue as
I think this is a increasingly common problem/question as ML starts to get
put to use on larger user bases.



1) In ML7 semantics is *built in* at the core level.  It performs *vastly
faster* then the semantic libraries from V6

2) During Query I cannot imagine anything out performing the builtin
role/security module.  Its done deeply in the code, it parallizes across
nodes,

  it doesnt require 2 passes (to get the list of URI's then to query
against them) so in theory it should scale to any size and I have not hear
of any cases where number of roles was an issue,

   except in The Admin UI (port 8001) where the GUI there is not optimized
for huge numbers of roles

3) The builtin security is *rock solid* ... you cant circumvent it, it
literally hides the existance of documents that you have no rights to.  It
has passed numerous security audits and I doubt any but the most dedicated
could equal the security aspects in user code.



BUT ...

What I dont know ... is the performance of change.

How expensive is it to

A) Add a new set of N documents to a role (actually add a new role to a set
of N documents) ... it requires rewriting every document

B) How expensive is it to create a new collection then add all the relevant
documents to that ? (it requires rewriting every document)

C) How expensive is it to add a new role if you have thousands or millions
?  Is it linear or does it take increasingly long to maintain large numbers
of roles ?





I dont know the answers to these ... but they are worth considering.

So far to my mind most arguments would favor using the builtin role/user
access for this purpose ... its rock solid for security and for query its
an *obvious*  performance gain ,

BUT ... suppose A,B,C are "expensive" AND they are frequently executed,

at some point it might make  more sense to handle user roles at the app
level ... depending on how often you change roles vs how often you query
documents.







-----------------------------------------------------------------------------

David Lee
Lead Engineer
MarkLogic Corporation
[email protected]
Phone: +1 812-482-5224

Cell:  +1 812-630-7622
www.marklogic.com



*From:* [email protected] [
mailto:[email protected]<[email protected]>]
*On Behalf Of *Harry B.
*Sent:* Wednesday, December 11, 2013 12:37 PM
*To:* MarkLogic Developer Discussion
*Subject:* Re: [MarkLogic Dev General] Document Level Authorization (Roles
and Users)



I have a lot more to add to what you've brought up, David, but short on
time at the moment. I can add a few things quickly and perhaps put together
more detail as a blog post or two later...



First of all, scalability of user roles is really not a huge issue.
MarkLogic stores all the data, roles, etc. as XML, so in theory it's as
massively scalable as needed. That said, I've only implemented/tested this
approach with up to about 2000 users. I have verified it across a few
million documents for those couple thousand users, though.



Secondly, the semantic approach is something else I have done when
role-based options weren't available in the project design. I do think this
is a very strong application logic-based approach and in general it is very
performant for what it is, though good query construction is essential for
it to scale. It is very fast for creating large numbers of shares since
it's an insert operation (or for revoking large numbers of shares). For a
recent project, I initially went with the semantic approach that I had done
with another project, but this time I did a side-by-side comparison. When
there was a user with tens of thousands of shared documents (documents they
were entitled to at least read), the query time was somewhere around 2
seconds without any tuning or tweaking. The same query using roles took
0.02 seconds. I basically figured on it being two orders of magnitude
slower. That was a quick and dirty analysis, however, and I don't know if
using ML7's native support instead of a home-grown version of Michael
Blakeley's semantic library or other tuning/optimization might have brought
that down. It was enough at the time to convince me to push for leveraging
the ML security model.



The main reason to use the built in roles to control document access is
that ML has to do that query/processing no matter what. Collections,
semantics, and even adding data or properties to a document all "work" so
it's a matter of balancing your trade-offs.



More in a while...



On Wed, Dec 11, 2013 at 9:06 AM, David Lee <[email protected]> wrote:

A first and foremost question to ask is are you asking for server level
security on this sharing or are you happy with application level?

If you want or need server level security ( that is, if someone were to
access the ML server directly using their credentials and start issuing
queries could they gain access to docs they shouldn't) then the only way I
know of to do this right is using the server supplied role based security.
It is *hard code baked in* ... you s

imply cannot break it ... you can't even tell the existence of documents
which you do not have access.   Its also extremely efficient on query
because its done very deeply in the server.But it comes at the "price" of
using the built in security measures, mainly the price of having to touch
every document that has its role changed or the set of collections changed.

This is not a bad thing.  Its a great thing, but it does limit your choices
and there is a performance hit.  (how much ? as with most things "it
depends")



If, on the other hand, you physically restrict access to the ML server to
your app only, and you are confident in *your code* ... then there are
other options.



One I have been thinking about lately is the use of ML7 semantics
features.  This is a very lightweight way of storing lists of things,

it could for example store associations between users and the documents
they can view.   Similar to storing this data in an XML file(s) ... but

much faster for some use cases because of the way its indexed and you dont
have to change the target documents to change the list of who can see them
- unlike changing

what collections or roles a document has.   It does require doing a 2 phase
query though.  The first query to list the set of documents a user is
allowed to see, then a second query

given that list as a constraint onto a search.



I

-----------------------------------------------------------------------------

David Lee
Lead Engineer
MarkLogic Corporation
[email protected]
Phone: +1 812-482-5224

Cell:  +1 812-630-7622
www.marklogic.com



*From:* [email protected] [mailto:
[email protected]] *On Behalf Of *Timothy W. Cook
*Sent:* Wednesday, December 11, 2013 9:44 AM
*To:* MarkLogic Developer Discussion


*Subject:* Re: [MarkLogic Dev General] Document Level Authorization (Roles
and Users)







On Wed, Dec 11, 2013 at 11:41 AM, David Lee <[email protected]> wrote:

Harry, how many users have you tried with this scheme ?
I am myself considering something for a demo app but not sure if it scales
to thousands or hundreds of thousands or millions of users.





This is my concern also.  I need to scale to millions of users.  However,
each user will likely have less than one hundred other users to share
documents with.



There is also the issue that if you want to share a large set of documents
to a new user (say 10,000 docs) then those 10,000 docs need to be "touched'
(e.g. read and written),

this could be a heavy operation.





This is a scalability issue I would like to see if someone has experience
with.  I could easily have a user with 10,000 or more documents.  What is
the performance like when a new share is created across all of them?





The alternative, which is not as elegant but might perform better is to
keep access lists as data (say in an XML file or files) and handle the
access control at the user level.

You are right this is not as clean nor proven as using the system level
access control but it might be

* faster

* easier





This seems to be a brittle approach.  Though it may be the best?





Another option might be to store the access list of a document in document
properties.   You still have to touch the same number of files but
potentially smaller changes

(assuming the access list is smaller then the document) and you can do
property based searches combined with document searches so no "joining"
required.



This approach also crossed my mind because in relative terms, my access
list will be small.



I think this would make a great paper or blog



"How to handle access control of large numbers of users and documents"



Good idea.  Now we just need to do the research.  :-)





One thing I am not certain of yet.  What are the security and performance
implications of using keywords in a document and then through a query
provide visibility (to the UI) to only some of the documents? IOW: a user
might have read access to documents in a collection, but not knowing that
they exist and not having any access to the collection except via the UI.
 Security through obscurity kind of rings out that idea though.  THoguhts?



--Tim





-- 

MLHIM VIP Signup: http://goo.gl/22B0U
============================================
Timothy Cook, MSc           +55 21 94711995
MLHIM http://www.mlhim.org
Like Us on FB: https://www.facebook.com/mlhim2
Circle us on G+: http://goo.gl/44EV5
Google Scholar: http://goo.gl/MMZ1o
LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook


_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Document Level Authorization (Roles and Users)

Reply via email to