Background: In http://issues.apache.org/bugzilla/show_bug.cgi?id=34673, Yonik Seely proposes a ConstantScoreQuery, based on a Filter. And in http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg08007.html I proposed a mechanism to promote the use of Filters. Through all of this, Paul Elshot has hinted that there might be a better way.

Here's another proposal, tackling many of the same issues:

1. Add two methods to Query.java:

  public boolean constantScoring();
  public void constantScoring(boolean);

  When constantScoring(), the boost() is the score for matches.

2. Add two methods to Searcher.java:

  public BitSet cachedBitSet(Query) { return null; }
  public void cacheBitSet(Query, BitSet) {}

  IndexSearcher overrides these to maintain an LRU cache of bitsets.

3. Modify BooleanQuery so that, when constantScoring(), TooManyClauses is not thrown.

4. Modify BooleanScorer to, if constantScoring(),
  - check Searcher for a cached bitset
  - failing that, create a bitset
  - evaluate clauses serially, saving results in bitset
  - cache the bitset
  - use the bitset to handle doc(), next() and skipTo();

5. TermQuery and PhraseQuery could be similarly modified, so that, when constant scoring, bitsets are cached for very common terms (e.g., >5% of documents).

With these changes, WildcardQuery, PrefixQuery, RangeQuery etc., when declared to be constant scoring, will operate much faster and never throw TooManyClauses. We can add an option (the default?) to QueryParser to make these constant scoring.

Also, instead of BitSet we could use an interface:

  public interface DocIdSet {
    void add(int docId);
    boolean contains(int docId);
    int next(int docId);
  }

to permit sparse representations.

Thoughts?

Doug


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to