RE: FW: Solr and LCF security at query time

karl.wright Thu, 29 Apr 2010 07:46:04 -0700

If we aren't talking about a repository of some kind, then we aren't talking 
about using LCF.  If your design point is about applying security to NFS via an 
acl-xml file, your uploaded contribution will do that just fine (although I 
think you might want to use Filters in some places you are currently using 
Querys, according to what I've learned over the past day or two).


If a repository with security is involved, there's no benefit I can see to 
building yet another security mechanism above and beyond the one that the 
repository would provide.  It's double the administration, and in that light 
only makes sense at all if there's no native security mechanism present in 
whatever your data source is.  There are certainly a number of "repositories" 
with this characteristic, though - the web, rss feeds, file systems, etc.

Karl

________________________________
From: ext Peter Sturge [mailto:peter.stu...@googlemail.com]
Sent: Thursday, April 29, 2010 9:56 AM
To: dev@lucene.apache.org
Subject: Re: FW: Solr and LCF security at query time

Hi Karl,

- There's a significant extra load on the repository, because every search 
result has to be checked against the repository in real time

By repository, do you mean, for example, NTFS? You certainly wouldn't want, or 
need to do that at all, particularly for environments where the repository 
isn't available. That's kind of the point of having the acl decoupled.

- It will perform very poorly on queries were there are a lot of matching 
documents, but the search user can't see most of them

The performance of the filter queries would be no worse (or better) than any 
other of similar length/complexity. Essentially, the filter queries between the 
two models are just using a different set of attributes (acl-specific vs. 
intrinsic to the document). If someone felt they needed to build lots of 
super-long complex filter queries to define a set of allowed/denied documents, 
their general search performance is probably not going to be great anyway, and 
would be remedied by organizing the data more efficiently (which is a good idea 
in any case).


Thanks,
Peter


On Thu, Apr 29, 2010 at 1:10 PM, 
<karl.wri...@nokia.com<mailto:karl.wri...@nokia.com>> wrote:
Putting access control lookup at search-result time has the following benefits:

- It sees changes right away, when the underlying repository changes

Here are the drawbacks, as far as I can see:

- There's a significant extra load on the repository, because every search 
result has to be checked against the repository in real time
- It will perform very poorly on queries were there are a lot of matching 
documents, but the search user can't see most of them

Having only one general solution means that you have to pick one or the other 
of the two models.  We opted for the model we did because the drawbacks were 
potentially severe, especially under conditions of high demand.  The repository 
load question is not a trivial one, because it scales as the number of results 
returned, which is a potentially gigantic number.

However, I am perfectly fine with supporting both models.  Your suggested 
solution will work for some classes of problem.  It seems to me that in order 
to support it you will need a parallel infrastructure to do that.  We could 
develop that infrastructure within LCF, but it's a bit of work to do:

(1) Output an "internal repository document security identifier" into the 
index, in addition to tokens.  This id is not the same at all as the document's 
URI, which is what literal.id<http://literal.id> is currently set to, so a new 
solr schema field would need to be made for this.  All output connectors would 
need to be modified to do this, and all repository connectors as well.
(2) Since the security identifier would be valid within the context of a given 
repository connection, the "authority service" code that tries to verify 
visibility of a document given the authenticated user name and security 
identifier would need to look up the correct repository connection and call a 
method within it - which currently doesn't exist.  So we'd need to write such a 
method for all connectors that have security.
(3) Since this service would have a high load, and only be used under one 
particular model, I'd suggest actually defining a whole new webapp for it, so 
it can be distributed/controlled independently.

Karl


________________________________
From: ext Peter Sturge 
[mailto:peter.stu...@googlemail.com<mailto:peter.stu...@googlemail.com>]
Sent: Thursday, April 29, 2010 5:35 AM
To: 
connectors-u...@incubator.apache.org<mailto:connectors-u...@incubator.apache.org>
Cc: dev@lucene.apache.org<mailto:dev@lucene.apache.org>; 
connectors-...@incubator.apache.org<mailto:connectors-...@incubator.apache.org>;
 lucene-...@apache.org<mailto:lucene-...@apache.org>

Subject: Re: FW: Solr and LCF security at query time

Hi Karl,

I guess it comes down to - any solution is ultimately going to place access 
control on a search and not on data, so there isn't much to be gained by 
binding the access control to the data. Whatever attributes exist at index time 
to build an acl will still be there at query time, so by making the acl 
search-bound, the acl is decoupled from the data, allowing it to be used in any 
use case scenario.

Here's a typical sampling of use cases where the decoupling of acl from data is 
required:

One customer has a  'shop-search' requirement where, logged-in users' access to 
various shops changes daily, sometimes 4 or 5 times a day. There are several 
hundred such shops and 10s of millions of documents, and the indexing part 
doesn't have ownership of any of the 'source' documents.

Another example is a customer who has multiple sites and multiple AD domains. 
They have one domain for the UK, but a completely separate domain for 
Gibraltar. When data is replicated to  remote servers accessed by Gibraltar 
staff, these users have no SID information in the other domain.

An 'interesting' example of this at the extreme is 34rkl4ys Bank, where, due to 
departmental history, they have no fewer than 85 AD domains! This of course is 
a nightmare in itself, but trying to tie access information to data at storage 
time is virtually impossible in this environment.

The thing I'm trying to understand is that the decoupled approach works equally 
well for the requirements where you do have acl information at index time. I 
guess I'm not understanding the advantages to making schema changes and binding 
acl to data, when there's really no need. I particularly like your idea of 
using LCF as the facilitator of storing/retrieving such decoupled data (as 
opposed to just an xml file). It sounds like there's even a user interface for 
'non-technical' staff to make acl configuration changes. That's really cool, 
and ultimately an elegant solution that will fit present and future needs.


Kind regards,
Peter


On Thu, Apr 29, 2010 at 1:24 AM, 
<karl.wri...@nokia.com<mailto:karl.wri...@nokia.com>> wrote:
Hi Peter,

I'm more than happy to hear your customer's requirements, so no problem there.  
It does seem to me that they are a bit different than what I've seen.  I think 
there is plenty of room for different flavors of solution, so please by all 
means go ahead and propose your take on it!

Karl

________________________________________
From: ext Peter Sturge 
[peter.stu...@googlemail.com<mailto:peter.stu...@googlemail.com>]
Sent: Wednesday, April 28, 2010 8:07 PM
To: dev@lucene.apache.org<mailto:dev@lucene.apache.org>
Cc: 
connectors-u...@incubator.apache.org<mailto:connectors-u...@incubator.apache.org>;
 
connectors-...@incubator.apache.org<mailto:connectors-...@incubator.apache.org>;
 lucene-...@apache.org<mailto:lucene-...@apache.org>
Subject: Re: FW: Solr and LCF security at query time

Hi Karl,

I wasn't trying to to put pay to your design proposal, really the opposite - to 
highlight requirements that have found to be necessary for customers/users, and 
to hopefully get the best functionality for the product. If you feel I've put 
you out on any of the issues raised, then I apologize for that, it was 
certainly not my intention.

Peter

RE: FW: Solr and LCF security at query time

Reply via email to