Doug Cutting wrote:
  > It would indeed be nice to be able to short-circuit rewriting for
  > queries where it is a no-op.  Do you have a proposal for how this
could
  > be done?

First, this gets into the other part of Bug 31841.  I don't believe
MultiSearcher.rewrite() is ever called.  Rewriting is done in the
Weight's, which invoke the rewrite() method of the Searcher, which is
always the Seacher invoked by the MultiSearcher, not the MultiSearcher
itself.  In fact, MultiSearcher.rewrite() is broken.  It requires
Query.combine() which is unsupported except for the derived queries
(i.e., those for which rewriting is not a no-op).  When I added
topmostSearcher to get the Weight's to call the MultiSearcher.docFreq(),
that also caused them to call MultiSearcher.rewrite() which blows up on,
for example, a simple TermQuery, because there is no
TermQuery.combine().  That's why my patch contains a new default
implementation for Query.combine() (which as noted in the bug report is
probably not a good idea in general).

So, I don't believe there is any valid rewrite() implementation for
MultiSearcher to start from, unless I've completely misunderstood
something.

To address the question above, RemoteSearchable.rewrite() should be a
no-op, i.e. always return this.  For good error handling, it should
verify that the query does not require rewriting.  This requires some
mechanism to determine whether or not a query requires rewriting.  The
challenge here is that some query types have a non-trivial rewrite()
method not because they require rewriting, but because they might have
subqueries that require rewriting (e.g., BooleanQuery).  Other query
types (e.g., MultiTermQuery) always require rewriting, while those that
implement Weight's never require it.  I think an upward incompatibility
is required in the API to address this.

If that is acceptable, then this could work:
  1.  Add a new interface called Rewritable that specifies a boolean
rewriteRequired() method.
  2.  Have Query implement Rewritable but NOT provide an implementation
for rewriteRequired().  This will force all applications to add support
for this in order to upgrade.
  2.  Change all the Weight's to call Query.maybeRewrite() instead of
Query.rewrite().
  3.  Have Query.maybeRewrite() only call Query.rewrite() if
Query.rewriteRequired() is true.
  4.  Have RemoteSearchable.maybeRewrite() throw an Exception if
Query.rewriteRequired() is true.
  5.  Implement rewriteRequired() for all the built-in Query types
(which is either true for derived queries, false for primitive queries,
or an or of rewriteRequired() for all the subqueries).

Maybe there's a better way, but this should work.  It does require an
extra pass over the query.  There is a potential hole if there are
applications that implement new primitive queries, i.e. have Weight's
that directly call Query.rewrite().  This hole could be (mostly) plugged
by renaming rewrite(), but that would introduce another upward
incompatibility.

An optimization could omit the call to rewriteRequired() in
Query.maybeRewrite(), as this mechanism is really only needed in
RemoteSearchable (and could be beneficial in MultiSeacher).

There is still the need to properly implement Query.combine() for all
query types (which is greatly simplified by a good default
implementation).

Chuck

  > -----Original Message-----
  > From: Doug Cutting [mailto:[EMAIL PROTECTED]
  > Sent: Thursday, January 13, 2005 11:41 AM
  > To: Lucene Developers List
  > Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems
with
  > Similarity.docFreq() ?
  > 
  > Chuck Williams wrote:
  > > If auto-filters can provide an effective implementation for
  > RangeQuery's
  > > that avoids rewriting, and we can give up MultiTermQuery and
  > PrefixQuery
  > > in the distributed environment, then how about something like this
  > > refinement:
  > >   1.  No rewriting is done.
  > 
  > It would indeed be nice to be able to short-circuit rewriting for
  > queries where it is a no-op.  Do you have a proposal for how this
could
  > be done?
  > 
  > >   2.  The central node maintains a cache of aggregate docFreq data
  > that
  > > is incrementally built on demand, and flushed after any remote
node
  > > opens a new Searcher.
  > >   3.  The central node computes the Weights by accessing the
docFreq
  > for
  > > each query term.  This looks the value up in the cache, or queries
it
  > > from each remote node, sums the results, and caches the result.
  > >
  > > This seems simple and avoids a great deal of IPC traffic,
especially
  > in
  > > the common case where popular query terms are frequently reused.
  > 
  > I think this sort of a docFreq cache would be easy to build into
either
  > MultiSearcher or RemoteSearchable.
  > 
  > > I presume the auto-filters get pushed out to each remote node as
part
  > of
  > > the query?
  > 
  > They're not yet implemented, so we don't know.  One implementation
would
  > be that Scorers would automatically use filters for amenable query
  > clauses.  If that's the way things are done then yes, the filters
would
  > essentially be a part of the query.  No matter how they're
implemented,
  > we should take care to consider remote performance.
  > 
  > Doug
  > 
  >
---------------------------------------------------------------------
  > To unsubscribe, e-mail: [EMAIL PROTECTED]
  > For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to