Doug Cutting wrote: > That sounds hairy. Why not just add a single new method: > > boolean Query.isRewritten() { returns true; } > > Then override this in TermQuery, PhraseQuery and SpanQuery to return > false, and in BooleanQuery to walk its clauses and return true iff any > of them return true. As an optimization, RemoteSearchable could avoid > calling rewrite() when this is true.
I agree that simplification makes more sense. This is the same method as what I had called rewriteRequired (which to me is a more intuitive name since isRewritten()==true doesn't mean the rewrite is done, rather that it needs to be done). The other parts of the mechanism I first proposed do things that upon reflection we don't really need to do. Chuck > -----Original Message----- > From: Doug Cutting [mailto:[EMAIL PROTECTED] > Sent: Friday, January 14, 2005 9:33 AM > To: Lucene Developers List > Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems with > Similarity.docFreq() ? > > Chuck Williams wrote: > > Doug Cutting wrote: > > > It would indeed be nice to be able to short-circuit rewriting for > > > queries where it is a no-op. Do you have a proposal for how this > > could > > > be done? > > > > First, this gets into the other part of Bug 31841. I don't believe > > MultiSearcher.rewrite() is ever called. Rewriting is done in the > > Weight's, which invoke the rewrite() method of the Searcher, which is > > always the Seacher invoked by the MultiSearcher, not the MultiSearcher > > itself. > > This would be fixed by the proposal under consideration. Weights would > be constructed much earlier, using the top-level Searcher, so rewrites > would use this too. > > > In fact, MultiSearcher.rewrite() is broken. It requires > > Query.combine() which is unsupported except for the derived queries > > (i.e., those for which rewriting is not a no-op). When I added > > topmostSearcher to get the Weight's to call the > MultiSearcher.docFreq(), > > that also caused them to call MultiSearcher.rewrite() which blows up > on, > > for example, a simple TermQuery, because there is no > > TermQuery.combine(). That's why my patch contains a new default > > implementation for Query.combine() (which as noted in the bug report > is > > probably not a good idea in general). > > > > So, I don't believe there is any valid rewrite() implementation for > > MultiSearcher to start from, unless I've completely misunderstood > > something. > > It looks like MultiSearcher.rewrite() was never implemented correctly > since it was never called -- a latent bug. It only needs to be called > when queries are rewritten to something different: > > public Query rewrite(Query original) throws IOException { > Query[] queries = new Query[searchables.length]; > boolean changed = false; > for (int i = 0; i < searchables.length; i++) { > Query rewritten = searchables[i].rewrite(original); > changed = !rewritten.equals(original); > queries[i] = rewritten; > } > if (changed) { > return original.combine(queries); > } else { > return original; > } > } > > Then we'll need an implementation of combine() for all query types. The > implementation for BooleanQuery is fairly simple: combine() each of the > corresponding clauses. For TermQuery, PhraseQuery and SpanQuery combine > should create a deduplicated OR. Derived queries already have an > implementation. > > > To address the question above, RemoteSearchable.rewrite() should be a > > no-op, i.e. always return this. For good error handling, it should > > verify that the query does not require rewriting. This requires some > > mechanism to determine whether or not a query requires rewriting. The > > challenge here is that some query types have a non-trivial rewrite() > > method not because they require rewriting, but because they might have > > subqueries that require rewriting (e.g., BooleanQuery). Other query > > types (e.g., MultiTermQuery) always require rewriting, while those > that > > implement Weight's never require it. I think an upward > incompatibility > > is required in the API to address this. > > > > If that is acceptable, then this could work: > > 1. Add a new interface called Rewritable that specifies a boolean > > rewriteRequired() method. > > 2. Have Query implement Rewritable but NOT provide an > implementation > > for rewriteRequired(). This will force all applications to add > support > > for this in order to upgrade. > > 2. Change all the Weight's to call Query.maybeRewrite() instead of > > Query.rewrite(). > > 3. Have Query.maybeRewrite() only call Query.rewrite() if > > Query.rewriteRequired() is true. > > 4. Have RemoteSearchable.maybeRewrite() throw an Exception if > > Query.rewriteRequired() is true. > > 5. Implement rewriteRequired() for all the built-in Query types > > (which is either true for derived queries, false for primitive queries, > > or an or of rewriteRequired() for all the subqueries). > > That sounds hairy. Why not just add a single new method: > > boolean Query.isRewritten() { returns true; } > > Then override this in TermQuery, PhraseQuery and SpanQuery to return > false, and in BooleanQuery to walk its clauses and return true iff any > of them return true. As an optimization, RemoteSearchable could avoid > calling rewrite() when this is true. > > Doug > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]