Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

Doug Cutting Thu, 13 Jan 2005 11:41:22 -0800

Chuck Williams wrote:

If auto-filters can provide an effective implementation for RangeQuery's
that avoids rewriting, and we can give up MultiTermQuery and PrefixQuery
in the distributed environment, then how about something like this
refinement:
  1.  No rewriting is done.

It would indeed be nice to be able to short-circuit rewriting for queries where it is a no-op. Do you have a proposal for how this could be done?

  2.  The central node maintains a cache of aggregate docFreq data that
is incrementally built on demand, and flushed after any remote node
opens a new Searcher.
  3.  The central node computes the Weights by accessing the docFreq for
each query term.  This looks the value up in the cache, or queries it
from each remote node, sums the results, and caches the result.

This seems simple and avoids a great deal of IPC traffic, especially in
the common case where popular query terms are frequently reused.

I think this sort of a docFreq cache would be easy to build into either MultiSearcher or RemoteSearchable.

I presume the auto-filters get pushed out to each remote node as part of
the query?

They're not yet implemented, so we don't know. One implementation would be that Scorers would automatically use filters for amenable query clauses. If that's the way things are done then yes, the filters would essentially be a part of the query. No matter how they're implemented, we should take care to consider remote performance.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

Reply via email to