Mark, > Thanks to the recent changes (see CVS) in TermFreqVector > support we can now make use of term offset information held > in the Lucene index rather than incurring the cost of > re-analyzing text to highlight it. > > I have created a class ( see > http://www.inperspective.com/lucene/TokenSources.java ) which > handles creating a TokenStream from the TermPositionVector > stored in the database which can then be passed to the highlighter. > This approach is significantly faster than re-parsing the > original text. > If people are happy with this class I'll add it to the > Highlighter sandbox but it may sit better elsewhere in the > Lucene code base as a more general purpose utility. > > BTW as part of putting this together I found that the > TermFreq code throws a null pointer when indexing fields that > produce no tokens (ie empty or all stopwords). Otherwise > things work very well.
This is great news! While I won't have the time to test this until probably mid November I do look forward to the speed improvements as the current highlighting mechanisms (reparsing the text) was just not performant enough under heavy loads. Regards, Bruce Ritchie http://www.jivesoftware.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]