Re: TermDocs.skipTo()

Christoph Goller Wed, 07 Apr 2004 11:13:59 -0700

Doug,

Daniel found a bug today and therefore I reviewed skipTo once again.
Here are some further things to consider:

*) MultiTermDocs.skipTo could easily be optimized too, couldn\x{00B4}t it?

*) SegmentTermDocs: skipStream never closed

*) SegmentTermPositions: seek(Terminfo): probably should always make
proxCount = 0;

*) I think due to your last changes SegmentTermDocs makes one skip less than is required? However, I haven´t tested this.

while (target > skipDoc && skipCount < numSkips) {
        lastSkipDoc = skipDoc;
        lastFreqPointer = freqPointer;
        lastProxPointer = proxPointer;

        if (skipDoc != 0 && skipDoc >= doc)
          numSkipped += skipInterval;

        skipDoc += skipStream.readVInt();
        freqPointer += skipStream.readVInt();
        proxPointer += skipStream.readVInt();

        skipCount++;
      }

      // if we found something to skip, then skip it
      if (lastFreqPointer > freqStream.getFilePointer()) {
        freqStream.seek(lastFreqPointer);
        skipProx(lastProxPointer);

        doc = lastSkipDoc;
        count += numSkipped;
      }

Consider exit of while because of skipCount == numSkips. Then doc becomes lastSkipDoc not skipDoc!

*) PhraseScorer.skipTo jumps one doc too far because of call to sort() which calls next for each PhrasePosition. Here is Daniels test that demonstrates this:

public class DanielBug {

private final static String DIR = "/tmp/testindex";

  public static void main (String[] args) throws Exception {
    Analyzer a = new StandardAnalyzer();
    IndexWriter iw = new IndexWriter(DIR, a, true);

    Document d = new Document();
    // 0 hits only if this field contains the same value as
    // the same field in the next document:
    d.add(new Field("source", "marketing info", true, true, true));
    iw.addDocument(d);

    d = new Document();
    d.add(new Field("contents", "foobar", true, true, true));
    d.add(new Field("source", "marketing info", true, true, true));
    iw.addDocument(d);

    iw.optimize();
    iw.close();
    System.out.println("Indexing Done");

IndexSearcher is = new IndexSearcher(DIR); Query q = QueryParser.parse("+contents:foobar +source:\"marketing info\"", "", a); Hits hits = is.search(q); System.out.println("q="+q); System.out.println("hits="+hits.length()); }

}

Instead of 1 hit, 0 hits are found with 1.4rc2, while 1.3 finds the hit. I
committed the necessary change to PhraseScorer already and it fixes the problem.

Unfortunately, I haven´t found the time to restructure the IndexReaders so far.
Hopefully tomorrow :-)

Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: TermDocs.skipTo()

Reply via email to