RE: Faster highlighting with TermPositionVectors

Bruce Ritchie Thu, 28 Oct 2004 22:13:33 -0700

Mark,

> Thanks to the recent changes (see CVS) in TermFreqVector 
> support we can now make use of term offset information held 
> in the Lucene index rather than incurring the cost of 
> re-analyzing text to highlight it.
> 
> I have created a  class ( see 
> http://www.inperspective.com/lucene/TokenSources.java ) which 
> handles creating a TokenStream from the TermPositionVector 
> stored in the database which can then be passed to the highlighter.
> This approach is significantly faster than re-parsing the 
> original text.
> If people are happy with this class I'll add it to the 
> Highlighter sandbox but it may sit better elsewhere in the 
> Lucene code base as a more general purpose utility.
> 
> BTW as part of putting this together I found that the 
> TermFreq code throws a null pointer when indexing fields that 
> produce no tokens (ie empty or all stopwords). Otherwise 
> things work very well.


This is great news! While I won't have the time to test this until probably mid 
November I do look forward to the speed improvements as the current highlighting 
mechanisms (reparsing the text) was just not performant enough under heavy loads.


Regards,

Bruce Ritchie
http://www.jivesoftware.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Faster highlighting with TermPositionVectors

Reply via email to