Yes, your point about filters vs queries is a good one. I do need to move the fq building to the Lucene model.
It's true that in the case of, say, NTFS, there is already access control built-in to the source files. The differences are, as you pointed out, the ones that don't have this, and also Solr indexes that hold multiple types of data (a bit of NTFS, a bit of web, some rss etc.). It's probably true to say that most Solr indexes today contain at least some data that has no intrinsic security built in to its source. I can see from an LCF perspective, that the proposed model fits in with it, but is LCF really an 'all or nothing' framework with regards repositories? I guess, not coming from the LCF side, but from the 'generic data' side of things, I thought LCF would work for 1) the authority side of things (i.e. an interface into AD et al.) and 2) possibly an interface into decoupled acl storage - i.e. the decoupled data, whether stored in a file, another Solr cole, a sql db or whatever, would become the 'repository' - with perhaps the difference that it holds user->search acl rather than user->file acl. Would LCF work in this way, or would it simply be too much work to make it practical? Thanks, Peter On Thu, Apr 29, 2010 at 3:45 PM, <karl.wri...@nokia.com> wrote: > If we aren't talking about a repository of some kind, then we aren't > talking about using LCF. If your design point is about applying security to > NFS via an acl-xml file, your uploaded contribution will do that just fine > (although I think you might want to use Filters in some places you are > currently using Querys, according to what I've learned over the past day or > two). > > If a repository with security is involved, there's no benefit I can see to > building yet another security mechanism above and beyond the one that the > repository would provide. It's double the administration, and in that light > only makes sense at all if there's no native security mechanism present in > whatever your data source is. There are certainly a number of > "repositories" with this characteristic, though - the web, rss feeds, file > systems, etc. > > Karl > > ------------------------------ > *From:* ext Peter Sturge [mailto:peter.stu...@googlemail.com] > *Sent:* Thursday, April 29, 2010 9:56 AM > > *To:* dev@lucene.apache.org > *Subject:* Re: FW: Solr and LCF security at query time > > Hi Karl, > > - There's a significant extra load on the repository, because every search > result has to be checked against the repository in real time > > By repository, do you mean, for example, NTFS? You certainly wouldn't want, > or need to do that at all, particularly for environments where the > repository isn't available. That's kind of the point of having the acl > decoupled. > > - It will perform very poorly on queries were there are a lot of matching > documents, but the search user can't see most of them > > The performance of the filter queries would be no worse (or better) than > any other of similar length/complexity. Essentially, the filter queries > between the two models are just using a different set of attributes > (acl-specific vs. intrinsic to the document). If someone felt they needed to > build lots of super-long complex filter queries to define a set of > allowed/denied documents, their general search performance is probably not > going to be great anyway, and would be remedied by organizing the data more > efficiently (which is a good idea in any case). > > > Thanks, > Peter > > > On Thu, Apr 29, 2010 at 1:10 PM, <karl.wri...@nokia.com> wrote: > >> Putting access control lookup at search-result time has the following >> benefits: >> >> - It sees changes right away, when the underlying repository changes >> >> Here are the drawbacks, as far as I can see: >> >> - There's a significant extra load on the repository, because every search >> result has to be checked against the repository in real time >> - It will perform very poorly on queries were there are a lot of matching >> documents, but the search user can't see most of them >> >> Having only one general solution means that you have to pick one or the >> other of the two models. We opted for the model we did because the >> drawbacks were potentially severe, especially under conditions of high >> demand. The repository load question is not a trivial one, because it >> scales as the number of results returned, which is a potentially gigantic >> number. >> >> However, I am perfectly fine with supporting both models. Your suggested >> solution will work for some classes of problem. It seems to me that in >> order to support it you will need a parallel infrastructure to do that. We >> could develop that infrastructure within LCF, but it's a bit of work to do: >> >> (1) Output an "internal repository document security identifier" into the >> index, in addition to tokens. This id is not the same at all as the >> document's URI, which is what literal.id is currently set to, so a new >> solr schema field would need to be made for this. All output connectors >> would need to be modified to do this, and all repository connectors as well. >> (2) Since the security identifier would be valid within the context of a >> given repository connection, the "authority service" code that tries to >> verify visibility of a document given the authenticated user name and >> security identifier would need to look up the correct repository connection >> and call a method within it - which currently doesn't exist. So we'd need >> to write such a method for all connectors that have security. >> (3) Since this service would have a high load, and only be used under one >> particular model, I'd suggest actually defining a whole new webapp for it, >> so it can be distributed/controlled independently. >> >> Karl >> >> >> ------------------------------ >> *From:* ext Peter Sturge [mailto:peter.stu...@googlemail.com] >> *Sent:* Thursday, April 29, 2010 5:35 AM >> *To:* connectors-u...@incubator.apache.org >> *Cc:* dev@lucene.apache.org; connectors-...@incubator.apache.org; >> lucene-...@apache.org >> >> *Subject:* Re: FW: Solr and LCF security at query time >> >> Hi Karl, >> >> I guess it comes down to - any solution is ultimately going to place >> access control on a search and not on data, so there isn't much to be gained >> by binding the access control to the data. Whatever attributes exist at >> index time to build an acl will still be there at query time, so by making >> the acl search-bound, the acl is decoupled from the data, allowing it to be >> used in any use case scenario. >> >> Here's a typical sampling of use cases where the decoupling of acl from >> data is required: >> >> One customer has a 'shop-search' requirement where, logged-in users' >> access to various shops changes daily, sometimes 4 or 5 times a day. There >> are several hundred such shops and 10s of millions of documents, and the >> indexing part doesn't have ownership of any of the 'source' documents. >> >> Another example is a customer who has multiple sites and multiple AD >> domains. They have one domain for the UK, but a completely separate domain >> for Gibraltar. When data is replicated to remote servers accessed by >> Gibraltar staff, these users have no SID information in the other domain. >> >> An 'interesting' example of this at the extreme is 34rkl4ys Bank, where, >> due to departmental history, they have no fewer than 85 AD domains! This of >> course is a nightmare in itself, but trying to tie access information to >> data at storage time is virtually impossible in this environment. >> >> The thing I'm trying to understand is that the decoupled approach works >> equally well for the requirements where you do have acl information at index >> time. I guess I'm not understanding the advantages to making schema changes >> and binding acl to data, when there's really no need. I particularly like >> your idea of using LCF as the facilitator of storing/retrieving such >> decoupled data (as opposed to just an xml file). It sounds like there's even >> a user interface for 'non-technical' staff to make acl configuration >> changes. That's really cool, and ultimately an elegant solution that will >> fit present and future needs. >> >> >> Kind regards, >> Peter >> >> >> On Thu, Apr 29, 2010 at 1:24 AM, <karl.wri...@nokia.com> wrote: >> >>> Hi Peter, >>> >>> I'm more than happy to hear your customer's requirements, so no problem >>> there. It does seem to me that they are a bit different than what I've >>> seen. I think there is plenty of room for different flavors of solution, so >>> please by all means go ahead and propose your take on it! >>> >>> Karl >>> >>> ________________________________________ >>> From: ext Peter Sturge [peter.stu...@googlemail.com] >>> Sent: Wednesday, April 28, 2010 8:07 PM >>> To: dev@lucene.apache.org >>> Cc: connectors-u...@incubator.apache.org; >>> connectors-...@incubator.apache.org; lucene-...@apache.org >>> Subject: Re: FW: Solr and LCF security at query time >>> >>> Hi Karl, >>> >>> I wasn't trying to to put pay to your design proposal, really the >>> opposite - to highlight requirements that have found to be necessary for >>> customers/users, and to hopefully get the best functionality for the >>> product. If you feel I've put you out on any of the issues raised, then I >>> apologize for that, it was certainly not my intention. >>> >>> Peter >>> >>> >> >