Christoph Goller wrote:
Problem: In TermInfosReader index (every 128th term) skipOffsets are not
stored! Due to documentation getIndexOffset returns the offset of the greatest
index entry which is less than term. I believe this is not true it may
deliver the term itself! If we seek for a term that is in the index, this
term and its termInfo will not be read from the enumerator by scanEnum and
consequently no skipOffset will be found, even if present. This could lead
to serious problems when skipTo is used, couldnīt it?
Yes, this does look like a problem.
Possible Solution: Store skipOffset in *.tii too.
I think that's a good solution. We should change TermInfosWriter.FORMAT from -1 to -2 and then use that to keep SegmentTermEnum.next() back-compatible, since folks may have created indexes with 1.4RC2. The simplest way to do this would be to disable skipTo() when TermInfosWriter.FORMAT is -1, by setting skipInterval to Integer.MAX_VALUE, as is done for 1.3 indexes.
Shall I do this, or would you like to?
I would prefer to leave this task to you :-)
However, I am currently debugging/ stepping through a problem found by Daniel
with 1.4rc2. Maybe its caused by a skipTo() bug. I am not sure yet. Maybe its a bug in Conjunction Scorer. If I cannot solve the problem I will post it to the mailing list tonight.
What about the following agreement: I try to restucture the IndexReader stuff as we already agreed, you try to solve the skipTo() problem, and then we review each others work.
Christoph
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]