Hi All,
I appied Grant's new term vector patch. Termvectors can now optionally be stored with positions (token number) and/or offsets (start and end offset of token in original text).
I ended up with some modifications of Grant's patch:
*) IndexReader.getTermVectors(...) methods now always returns null, if there are no term vectors for the specified input. If they throw an IOException this indcates that there was an error while accessing the index. So far, IOExceptions had been caught in TermVectorsReader. This is a small change to the old API but I think it's more consistent.
*) I did some low-level changes concerning reading and writing positions and offsets. Most importantly, I switch to delta-encoding where possible. This should save some space.
*) I changed the public API of termvectors a little bit. E.g. IndexReader is also using Field.TermVector.VALUE instead of the boolean variables.
*) I did some code restructuring and removed some unused methods.
All unit tests are still running. I hope everything I did was correct :-) Looking forward to feedback.
Christoph
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]