Hi Micheal, ManifoldCF does not change the underlying numbers, but the Solr query component implements the document restriction query in such a way that:
(a) that part of the query can be cached internally by Solr, so that it does not have to be calculated more than once for any given user, and (b) it doesn't mess with the rest of the query, or change the scoring. I also think that clauses of the size you mentioned (on the order of 1000 terms) is not very large by Lucene standards for the way the Solr query component is doing the logic, so I would not expect the performance to be unusably bad. This is based on other repositories than ManifoldCF supports that require even MORE access tokens per user. But that's something you may need to experiment with to convince yourself of. Thanks, Karl On Sun, Mar 6, 2011 at 12:31 PM, Michael Roberts <[email protected]> wrote: > Hi, Karl. > > Thanks for the info and for offering to help. MCF and the book both look > very interesting as a means of loading Lucene/Solr via crawling different > data repositories. I've spoken to my customer about buying some copies of > the book in the interest of solving that problem. However, after reading > some more MCF literature online and the TOC for your book, it's not quite > clear to me how MCF resolves the problem of record-level security. > > Say each record in Oracle has a label, and we use MCF to crawl Oracle and > load Lucene/Solr while retaining the label as a field. You have access to > labels 1, 2, and 3, and I have access to labels 2, 3, and 4. Right now, if > you search Solr, I would be passing in your labels as a Solr query param > (e.g., &fq=labels:(1+2+3), and if I search Solr, I would be passing in my > own labels as &fq=labels:(2+3+4). > > Does MCF alter this equation somehow? Does it refactor and hash, or > somehow otherwise abstract away the labels so they don't need to be added to > every query? > > Thanks! > > > > On Tue, Mar 1, 2011 at 8:56 AM, Karl Wright <[email protected]> wrote: >> >> Hi Michael, >> >> The scenario you describe does not sound particularly problematic. >> The Solr SearchComponent presented in SOLR-1895 uses Lucene filters to >> perform the necessary security-based query restriction. These can be >> cached, or so i am told, and thus computed essentially once per user. >> The number 1000 to 2000 does not sound scary either; we've got some >> repositories that typically produce more than that, although if you >> get up to the 10,000 range you will start to notice the delay. >> >> If you want to pursue this, I'd be happy to help get the necessary >> support added to the existing database connector. It also sounds like >> a specialized Oracle authority connector may also be needed. I guess >> what would be needed to move forward would be the following: >> >> - a ManifoldCF ticket in Jira describing the problem >> (https://issues.apache.org/jira) >> - A detailed description in the ticket of how one would obtain OLS >> access tokens, given a list of id values meant to match a row id >> column, e.g.: >> >> select the_id,??? from table where the_id in (?,?,?,?...) >> >> - A detailed description of how a mapping from a standard Active >> Directory user name (e.g. [email protected]) would be mapped to a list >> of access tokens >> - Any other complexities to the model, e.g. ability to deny records, >> and how that might work. >> >> When this is all together I will create a branch to work on your >> ticket, and you can test the result before I commit it back to >> ManifoldCF trunk. >> >> Thanks! >> Karl >> >> >> On Tue, Mar 1, 2011 at 8:39 AM, Michael Roberts >> <[email protected]> wrote: >> > Thanks. I will look into the book and online material. >> > >> > OLS tokens would basically be Oracle decimals, although most are short >> > (e.g., 1111). Each user has at least hundreds of labels, and >> > more-privileged users have 1000-2000 right now. Right now, we query the >> > DB >> > for the user's labels, then stick them in a Sol fq param, so we're >> > basically >> > saying that we want the union of the result set matching the user's >> > query >> > and the result set matching the user's privileges. >> > >> > Thanks again >> > >> > On Mon, Feb 28, 2011 at 10:18 PM, Karl Wright <[email protected]> >> > wrote: >> >> >> >> ManifoldCF includes a model for search engine security enforcement on >> >> a per-document basis. However, the existing database connector does >> >> not support OLS at this time; that would have to be added, although >> >> that is not very hard. >> >> >> >> The real question is whether ManifoldCF security model will improve >> >> the parameters of your problem, which I cannot answer without further >> >> information. If you want to learn more, the best description of the >> >> model can be found in ManifoldCF in Action. There's a preliminary >> >> electronic access program called MEAP which you can sign up for; see >> >> http://www.manning.com/wright. You'll want to read Chapter 4, which >> >> has not yet been released, but will be in a couple of weeks. The >> >> chapter includes example Solr integration code, which is similar to >> >> the code included in the patch for the ticket SOLR-1895. >> >> Alternatively, there's a fair bit of online material that attempts to >> >> explain the security model, which you might want to examine to see if >> >> you think the model would work well for this environment. The goal >> >> would be to learn what an OLS "access token" should look like, and how >> >> many of these there would likely be per user. If it's less than a >> >> couple of thousand, it's a viable model. >> >> >> >> Please let us know your thoughts. >> >> Karl >> >> >> >> On Mon, Feb 28, 2011 at 9:14 PM, Michael Roberts >> >> <[email protected]> wrote: >> >> > >> >> > Our corporate policy dictates that when we search Solr, we match the >> >> > user's >> >> > potentially thousands of OLS labels against a labels field in the >> >> > index. >> >> > This inefficiency results in enormous requests that results in >> >> > thousands >> >> > of >> >> > Boolean comparisons per query attempt. Someone on the Solr-user >> >> > mailing >> >> > list suggested that Manifold might be used to remedy the situation. >> >> > Is >> >> > that >> >> > correct, and if so, is anyone thinking about Oracle OLS support? >> >> > >> >> > Thanks! >> >> > >> >> > >> > >> > > >
