I just ran across this in the docs under Collection Performance, so it is
added here as part of the discussion.

"A practical guideline is that a document with fragments averaging 50K in
size should not belong to more than 100 collections. This should keep the
average fragment size increase to less than 10%."


On Wed, Dec 11, 2013 at 4:57 PM, Geert Josten <[email protected]> wrote:

> Anyone from LDS listening to this thread? I recall they were doing lots of
> user and role creation on the fly, for many many users. And I’m sure
> markmail will be able to pull out several threads on similar topics,
> including some about the LDS case..
>
>
>
> Cheers,
>
> Geert
>
>
>
> *Van:* [email protected] [mailto:
> [email protected]] *Namens *David Lee
> *Verzonden:* woensdag 11 december 2013 18:58
> *Aan:* MarkLogic Developer Discussion
> *Onderwerp:* Re: [MarkLogic Dev General] Document Level Authorization
> (Roles and Users)
>
>
>
> Great bits of info.
>
> I will add a tidbit more ... but would love the conversation to continue
> as I think this is a increasingly common problem/question as ML starts to
> get put to use on larger user bases.
>
>
>
> 1) In ML7 semantics is *built in* at the core level.  It performs *vastly
> faster* then the semantic libraries from V6
>
> 2) During Query I cannot imagine anything out performing the builtin
> role/security module.  Its done deeply in the code, it parallizes across
> nodes,
>
>   it doesnt require 2 passes (to get the list of URI's then to query
> against them) so in theory it should scale to any size and I have not hear
> of any cases where number of roles was an issue,
>
>    except in The Admin UI (port 8001) where the GUI there is not optimized
> for huge numbers of roles
>
> 3) The builtin security is *rock solid* ... you cant circumvent it, it
> literally hides the existance of documents that you have no rights to.  It
> has passed numerous security audits and I doubt any but the most dedicated
> could equal the security aspects in user code.
>
>
>
> BUT ...
>
> What I dont know ... is the performance of change.
>
> How expensive is it to
>
> A) Add a new set of N documents to a role (actually add a new role to a
> set of N documents) ... it requires rewriting every document
>
> B) How expensive is it to create a new collection then add all the
> relevant documents to that ? (it requires rewriting every document)
>
> C) How expensive is it to add a new role if you have thousands or millions
> ?  Is it linear or does it take increasingly long to maintain large numbers
> of roles ?
>
>
>
>
>
> I dont know the answers to these ... but they are worth considering.
>
> So far to my mind most arguments would favor using the builtin role/user
> access for this purpose ... its rock solid for security and for query its
> an *obvious*  performance gain ,
>
> BUT ... suppose A,B,C are "expensive" AND they are frequently executed,
>
> at some point it might make  more sense to handle user roles at the app
> level ... depending on how often you change roles vs how often you query
> documents.
>
>
>
>
>
>
>
>
> -----------------------------------------------------------------------------
>
> David Lee
> Lead Engineer
> MarkLogic Corporation
> [email protected]
> Phone: +1 812-482-5224
>
> Cell:  +1 812-630-7622
> www.marklogic.com
>
>
>
> *From:* [email protected] [
> mailto:[email protected]<[email protected]>]
> *On Behalf Of *Harry B.
> *Sent:* Wednesday, December 11, 2013 12:37 PM
> *To:* MarkLogic Developer Discussion
> *Subject:* Re: [MarkLogic Dev General] Document Level Authorization
> (Roles and Users)
>
>
>
> I have a lot more to add to what you've brought up, David, but short on
> time at the moment. I can add a few things quickly and perhaps put together
> more detail as a blog post or two later...
>
>
>
> First of all, scalability of user roles is really not a huge issue.
> MarkLogic stores all the data, roles, etc. as XML, so in theory it's as
> massively scalable as needed. That said, I've only implemented/tested this
> approach with up to about 2000 users. I have verified it across a few
> million documents for those couple thousand users, though.
>
>
>
> Secondly, the semantic approach is something else I have done when
> role-based options weren't available in the project design. I do think this
> is a very strong application logic-based approach and in general it is very
> performant for what it is, though good query construction is essential for
> it to scale. It is very fast for creating large numbers of shares since
> it's an insert operation (or for revoking large numbers of shares). For a
> recent project, I initially went with the semantic approach that I had done
> with another project, but this time I did a side-by-side comparison. When
> there was a user with tens of thousands of shared documents (documents they
> were entitled to at least read), the query time was somewhere around 2
> seconds without any tuning or tweaking. The same query using roles took
> 0.02 seconds. I basically figured on it being two orders of magnitude
> slower. That was a quick and dirty analysis, however, and I don't know if
> using ML7's native support instead of a home-grown version of Michael
> Blakeley's semantic library or other tuning/optimization might have brought
> that down. It was enough at the time to convince me to push for leveraging
> the ML security model.
>
>
>
> The main reason to use the built in roles to control document access is
> that ML has to do that query/processing no matter what. Collections,
> semantics, and even adding data or properties to a document all "work" so
> it's a matter of balancing your trade-offs.
>
>
>
> More in a while...
>
>
>
> On Wed, Dec 11, 2013 at 9:06 AM, David Lee <[email protected]>
> wrote:
>
> A first and foremost question to ask is are you asking for server level
> security on this sharing or are you happy with application level?
>
> If you want or need server level security ( that is, if someone were to
> access the ML server directly using their credentials and start issuing
> queries could they gain access to docs they shouldn't) then the only way I
> know of to do this right is using the server supplied role based security.
> It is *hard code baked in* ... you s
>
> imply cannot break it ... you can't even tell the existence of documents
> which you do not have access.   Its also extremely efficient on query
> because its done very deeply in the server.But it comes at the "price" of
> using the built in security measures, mainly the price of having to touch
> every document that has its role changed or the set of collections changed.
>
> This is not a bad thing.  Its a great thing, but it does limit your
> choices and there is a performance hit.  (how much ? as with most things
> "it depends")
>
>
>
> If, on the other hand, you physically restrict access to the ML server to
> your app only, and you are confident in *your code* ... then there are
> other options.
>
>
>
> One I have been thinking about lately is the use of ML7 semantics
> features.  This is a very lightweight way of storing lists of things,
>
> it could for example store associations between users and the documents
> they can view.   Similar to storing this data in an XML file(s) ... but
>
> much faster for some use cases because of the way its indexed and you dont
> have to change the target documents to change the list of who can see them
> - unlike changing
>
> what collections or roles a document has.   It does require doing a 2
> phase query though.  The first query to list the set of documents a user is
> allowed to see, then a second query
>
> given that list as a constraint onto a search.
>
>
>
> I
>
>
> -----------------------------------------------------------------------------
>
> David Lee
> Lead Engineer
> MarkLogic Corporation
> [email protected]
> Phone: +1 812-482-5224
>
> Cell:  +1 812-630-7622
> www.marklogic.com
>
>
>
> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Timothy W. Cook
> *Sent:* Wednesday, December 11, 2013 9:44 AM
> *To:* MarkLogic Developer Discussion
>
>
> *Subject:* Re: [MarkLogic Dev General] Document Level Authorization
> (Roles and Users)
>
>
>
>
>
>
>
> On Wed, Dec 11, 2013 at 11:41 AM, David Lee <[email protected]>
> wrote:
>
> Harry, how many users have you tried with this scheme ?
> I am myself considering something for a demo app but not sure if it scales
> to thousands or hundreds of thousands or millions of users.
>
>
>
>
>
> This is my concern also.  I need to scale to millions of users.  However,
> each user will likely have less than one hundred other users to share
> documents with.
>
>
>
> There is also the issue that if you want to share a large set of documents
> to a new user (say 10,000 docs) then those 10,000 docs need to be "touched'
> (e.g. read and written),
>
> this could be a heavy operation.
>
>
>
>
>
> This is a scalability issue I would like to see if someone has experience
> with.  I could easily have a user with 10,000 or more documents.  What is
> the performance like when a new share is created across all of them?
>
>
>
>
>
> The alternative, which is not as elegant but might perform better is to
> keep access lists as data (say in an XML file or files) and handle the
> access control at the user level.
>
> You are right this is not as clean nor proven as using the system level
> access control but it might be
>
> * faster
>
> * easier
>
>
>
>
>
> This seems to be a brittle approach.  Though it may be the best?
>
>
>
>
>
> Another option might be to store the access list of a document in document
> properties.   You still have to touch the same number of files but
> potentially smaller changes
>
> (assuming the access list is smaller then the document) and you can do
> property based searches combined with document searches so no "joining"
> required.
>
>
>
> This approach also crossed my mind because in relative terms, my access
> list will be small.
>
>
>
> I think this would make a great paper or blog
>
>
>
> "How to handle access control of large numbers of users and documents"
>
>
>
> Good idea.  Now we just need to do the research.  :-)
>
>
>
>
>
> One thing I am not certain of yet.  What are the security and performance
> implications of using keywords in a document and then through a query
> provide visibility (to the UI) to only some of the documents? IOW: a user
> might have read access to documents in a collection, but not knowing that
> they exist and not having any access to the collection except via the UI.
>  Security through obscurity kind of rings out that idea though.  THoguhts?
>
>
>
> --Tim
>
>
>
>
>
> --
>
> MLHIM VIP Signup: http://goo.gl/22B0U
> ============================================
> Timothy Cook, MSc           +55 21 94711995
> MLHIM http://www.mlhim.org
> Like Us on FB: https://www.facebook.com/mlhim2
> Circle us on G+: http://goo.gl/44EV5
> Google Scholar: http://goo.gl/MMZ1o
> LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
>


-- 
MLHIM VIP Signup: http://goo.gl/22B0U
============================================
Timothy Cook, MSc           +55 21 94711995
MLHIM http://www.mlhim.org
Like Us on FB: https://www.facebook.com/mlhim2
Circle us on G+: http://goo.gl/44EV5
Google Scholar: http://goo.gl/MMZ1o
LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to