Re: Performance of hit highlighting and finding term positions for a specific document

Kevin A. Burton Tue, 30 Mar 2004 17:50:20 -0800

Erik Hatcher wrote:

On Mar 30, 2004, at 7:56 PM, Kevin A. Burton wrote:

Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms.
This seems that it's very inefficient since lucene already knows the frequency and position of given terms in the index.

What if the original analyzer removed stopped words, stemmed, and injected synonyms?

Just use the same analyzer :)... I agree it's not the best approach for this reason and the CPU reason.

Also it seems that after all this time that Lucene should have efficient hit highlighting as a standard package. Is there any interest in seeing a contribution in the sandbox for this if it uses the index positions?

Big +1, regardless of the implementation details. Hit hilighting is so commonly requested that having it available at least in the sandbox, or perhaps even in the core, makes a lot of sense.

Well if we could make it efficient by using the frequency and positions of terms we're all set :)... I just need to figure out how to do this efficiently per document.

Kevin

--

Please reply using PGP.

http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

signature.asc
Description: OpenPGP digital signature

Re: Performance of hit highlighting and finding term positions for a specific document

Reply via email to