Kevin A. Burton wrote:

I'm playing with this package:

http://home.clara.net/markharwood/lucene/highlight.htm

Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms.
This seems that it's very inefficient since lucene already knows the frequency and position of given terms in the index.

My question is whether it's hard to find a TermPosition for a given term in a given document rather than the whole index.

IndexReader.termPositions( Term term ) is term specific not term and document specific.

As far as I know it's not currently possible to get this information from a standard lucene index.

Also it seems that after all this time that Lucene should have efficient hit highlighting as a standard package. Is there any interest in seeing a contribution in the sandbox for this if it uses the index positions?

I've been meaning to look into good ways to store token offset information to allow for very efficient highlighting and I believe Mark may also be looking into improving the highlighter via other means such as temporary ram indexes. Search the archives to get a background on some of the idea's we've tossed around ('Dmitry's Term Vector stuff, plus some' and 'Demoting results' come to mind as threads that touch this topic).


Regards,

Bruce Ritchie
http://www.jivesoftware.com/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature



Reply via email to