Robert, Thanks for your questions, things are beginning to fall into place (see http://issues.apache.org/jira/browse/LUCENE-584):
On Saturday 08 July 2006 14:14, robert engels wrote: > Is that really necessary for a filter? It seems that a filter implies > efficiency over a "scoring", and that filters should be able to be The proposed Matcher is superclass of Scorer formed by leaving out all the methods dealing with score values. > evaluated in a chained (or priority queue) fashion fairly efficiently The current DisjunctionSumScorer has a priority queue. I have to say that I did not yet consider a filter clause to a boolean query that is based on an disjunction of filters: in this case the "should" occurrence makes sense, but calling it a query is overdoing it, the disjunction would be a Filter itself. In principle, it is possible to evaluate a disjunction over filters during a query search, and it might even make sense when the disjunction is skipTo'd into infrequently as one of the required clauses in a boolean query. I have no idea whether this would be useful in practice. Also, in the same way as for top level disjunction queries, for filters there are more efficient methods of dealing with top level disjunction than a priority queue, see for example RangeFilter that collects all matching docs in a BitSet by iterating the TermScorers in the range one by one. The distinction between top level evaluation and nested evaluation is in the proposed Matcher: it has a match(MatchCollector) method for the top level, and the doc(), next() and skipTo() can be used for nested evaluation. The same distinction exists in Scorer: score(HitCollector, ...) and roughly the rest. > without any need for 'rewrites". Rewriting of a query is a way to make an association between a query and one or more index readers. The same association is currently present for a Filter in the bits(IndexReader) method, proposed to be deprecated. Perhaps the proposed getMatcher(IndexReader) method should be called Filter.rewrite(IndexReader), just as Query.rewrite(IndexReader). > With the new incremental updates of a filter (based upon a query) it > seems that the newly proposed filtering could be far less efficient. A Filter can be composed in the same way as an IndexReader can use multiple segments. Also, document deletion in a segment is currently done by a special purpose bit set. For incremental updates, the "rewriting" of a filter could be limited to the filter component associated with the newly added segment(s). > I think a filter change that just removes the BitSet dependency is > all that is needed, and anything else is overkill, but I admit I am I thought so, too. But then I realized that there are many things shared between current Scorers and Filters. These things are dealing mostly with matching and not at all with scoring. > probably missing something here. Perhaps a method to provide a complete Explanation of why a document matches, or does not match, a filtered query? > If these changes will eventually allow for efficient filtering based > upon non-indexed stored fields I am all for it. For the non indexed case, there is no choice but to read all stored data and evaluate a boolean function on the field of each document. I think the only efficiency to be gained there is in reading the stored fields, but iirc that has been fixed. For the indexed case a TermScorer is a Scorer is a proposed Matcher. The norms can already be left out, so the only things "left to be left out" are the term frequencies and positions. Once that is done there is no more need to use a non-indexed stored field for filtering, because an indexed-only field would always be more efficient in indexed data size. Regards, Paul Elschot > On Jul 8, 2006, at 2:24 AM, Paul Elschot wrote: > > > On Saturday 08 July 2006 05:44, robert engels wrote: > >> Agreed. The interface I proposed supports both sequential and random > >> access to the filter - hiding the implementation. > > > > For query searching, random access to a Filter is only needed > > in the forward direction, e.g. by nextInclude(docNr) or skipTo(docNr). > > > > As for why it's so involved: > > > > Making a "rewritten" Filter work more like a Scorer has the advantage > > that combinations of filters can (also) be evaluated using the same > > mechanisms as currently existing for Scorers. For this, some additions > > to the existing code will be needed, like adding an > > add(Filter, BooleanClause.Occur) to BooleanQuery, and a similar > > addition of a Matcher (proposed superclass of Scorer to "rewrite" a > > Filter to) to some of the underlying scorers. > > Such occurrences of filters are only "must" and "must not", "should" > > doesn't make sense because there is no score value. > > > > Also, it makes sense to have an explain() method for a "rewritten" > > Filter, because it can be for searching a query. > > > > Regards, > > Paul Elschot > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]