OK, I see. So your problem is fundamental. You are only able to check permissions AFTER you know what documents match - because you need the property values - but you cannot accurately assess whether it is a match until you know the user's security status. So you would have to query twice unless you can intercept the scoring process for your property search. This is complicated by the fact that the properties and text are in different documents yet you need to be able to treat them as one for all Lucene purposes.
If you were able to redesign the index scheme to merge the document and its properties into one document, then you could probably create an initial Lucene query with a hitCollector whose job it would be to construct a list of candidate matches and simultaneously build a query/filter combination for each potential match or group of matches the security properties allowable (and of course the initial query). I don't think even that would be pretty, but it could probably be made efficient, and it would reduce the necessity to run the security till later. Without that I think you would have to have some sort of custom scorer implementation, which is not a subject I know too much about. You might want to post this query on the main Java Lucene list as it isn't .NET specific and that has a broader readership. Yours, Moray ------------------------------------- Moray McConnachie Director of IT +44 1865 261 600 Oxford Analytica http://www.oxan.com -----Original Message----- From: Adrian Banks [mailto:[email protected]] Sent: 08 September 2009 13:54 To: [email protected] Subject: RE: Combining hits from multiple documents into a single hit Whether or not a user can see a property is not based on the property itself, but on the value of the property. I cannot therefore put the extra security conditions into the query upfront as I don't know the value to filter by. As an example: Article | Property 1 | Property 2 --------+------------+----------- A | X | J B | Y | K C | Z | L If a user can see everything, then searching for "B and Y" will return a single search result for article B. If another user cannot see a property if its value contains Y, then searching for "B and Y" will return no hits. I have no way of knowing what values a user can and cannot see upfront. They only way to tell is to perform the security check (currently done at the time of filtering a hit from a field in the document), which I obviously cannot do for every possible data value for each user. Adrian -----Original Message----- From: Moray McConnachie [mailto:[email protected]] Sent: 08 September 2009 13:34 To: [email protected] Subject: RE: Combining hits from multiple documents into a single hit An answer which could solve both the AND issue and the security issue would be to write the security into the query. You don't say how you are generating queries at the moment, but for a user who has access (say) to securedproperty1 and securedproperty3 but not securedproperty2 it should be reasonably easy to expand a simple text query from just "foo" to "maintext:foo securedproperty1:foo securedproperty3:foo" Similarly you could expand "foo AND bar" to "(maintext:foo securedproperty1:foo securedproperty3:foo) AND (maintext:bar securedproperty1:bar securedproperty3:bar) This could be done using a method which takes any LuceneQuery and iterates through it looking for TermQueries and expands any TermQuery to the requisite form. You'd have to think about RangeQueries and other types of query I guess. You could even write this as a wrapper to the main search method. Otherwise you could use a HitCollector. Regarding its problem with AND, are you saying that in standard Lucene if you do a "foo AND bar" query across all fields "foo" and "bar" must be stored in the same field to get a result? If so I didn't realise that, but you could also deal with this using query expansion on boolean queries. Hope this is helpful, Moray ------------------------------------- Moray McConnachie Director of IT +44 1865 261 600 Oxford Analytica http://www.oxan.com -----Original Message----- From: Adrian Banks [mailto:[email protected]] Sent: 08 September 2009 13:12 To: [email protected] Subject: Combining hits from multiple documents into a single hit I am trying to get a particular search to work and it is proving problematic. The actual source data is quite complex but can be summarised by the following example: I have articles that are indexed so that they can be searched. Each article also has multiple properties associated with it which are also indexed and searchable. When users search, they can get hits in either the main article or the associated properties. Regardless of where a hit is achieved, the article is returned as a search hit (ie. the properties are never a hit in their own right). Now for the complexity: Each property has security on it, which means that for any given user, they may or may not be able to see the property. If a user cannot see a property (based on its value for each article), they obviously do not get a search hit in it. This security check is proprietary and cannot be done using the typical mechanism of storing a role in the index alongside the other fields in the document. I currently have a index that contains the articles and properties indexed separately (ie. an article is indexed as a document, and each property has its own document). When a search happens, a hit in article A or a hit in any of the properties of article A should be classed as hit for article A alone, with the scores combined. To achieve this originally, Lucene v1.3 was modified to allow this to happen by changing BooleanQuery to have a custom Scorer that could apply the logic of the security check and the combination of two hits in different documents being classed as a hit in a single document. I am trying to upgrade this version to the latest (v2.3.2 - I am using Lucene.Net), but ideally without having to modify Lucene in any way. An additional problem occurs if I do an AND search. If an article contains the word foo and one of its properties contains the word bar, then searching for "foo AND bar" will return the article as a hit. My current code deals with this inside the custom Scorer. Any ideas how/if this can be done? I am thinking along the lines of using a custom HitCollector and passing that into the search, but when doing the boolean search "foo AND bar", execution never reaches my HitCollector as the ConjunctionScorer filters out all of the results from the sub-queries before getting there. Thanks, Adrian
