Daniel found a bug today and therefore I reviewed skipTo once again. Here are some further things to consider:
*) MultiTermDocs.skipTo could easily be optimized too, couldn\x{00B4}t it?
*) SegmentTermDocs: skipStream never closed
*) SegmentTermPositions: seek(Terminfo): probably should always make proxCount = 0;
*) I think due to your last changes SegmentTermDocs makes one skip less than is required? However, I havenīt tested this.
while (target > skipDoc && skipCount < numSkips) { lastSkipDoc = skipDoc; lastFreqPointer = freqPointer; lastProxPointer = proxPointer;
if (skipDoc != 0 && skipDoc >= doc) numSkipped += skipInterval;
skipDoc += skipStream.readVInt(); freqPointer += skipStream.readVInt(); proxPointer += skipStream.readVInt();
skipCount++; }
// if we found something to skip, then skip it if (lastFreqPointer > freqStream.getFilePointer()) { freqStream.seek(lastFreqPointer); skipProx(lastProxPointer);
doc = lastSkipDoc; count += numSkipped; }
Consider exit of while because of skipCount == numSkips. Then doc becomes lastSkipDoc not skipDoc!
*) PhraseScorer.skipTo jumps one doc too far because of call to sort() which calls next for each PhrasePosition. Here is Daniels test that demonstrates this:
public class DanielBug {
private final static String DIR = "/tmp/testindex";
public static void main (String[] args) throws Exception { Analyzer a = new StandardAnalyzer(); IndexWriter iw = new IndexWriter(DIR, a, true);
Document d = new Document(); // 0 hits only if this field contains the same value as // the same field in the next document: d.add(new Field("source", "marketing info", true, true, true)); iw.addDocument(d);
d = new Document(); d.add(new Field("contents", "foobar", true, true, true)); d.add(new Field("source", "marketing info", true, true, true)); iw.addDocument(d);
iw.optimize(); iw.close(); System.out.println("Indexing Done");
IndexSearcher is = new IndexSearcher(DIR);
Query q = QueryParser.parse("+contents:foobar +source:\"marketing info\"", "", a);
Hits hits = is.search(q);
System.out.println("q="+q);
System.out.println("hits="+hits.length());
}
}
Instead of 1 hit, 0 hits are found with 1.4rc2, while 1.3 finds the hit. I committed the necessary change to PhraseScorer already and it fixes the problem.
Unfortunately, I havenīt found the time to restructure the IndexReaders so far. Hopefully tomorrow :-)
Christoph
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]