Re: Highlighter API

markharw00d Fri, 18 Feb 2005 15:34:59 -0800

the Highlighter's getBestFragment method takes a TokenStream and a text. Wouldn't it be easier to give it just the text and an analyzer


That's how it was originally coded. The move to TokenStream was a 
deliberate choice, made in order to decouple the highlighter from the source of 
tokens and enable alternatives. Re-analyzing document text with an Analyzer is 
one (potentially costly) way of getting Tokens. Another is to use the new 
TermVector support (see TokenSources.java in the highlighter package). In my 
apps I have query processing stages which use TokenStreams to extract themes 
from result sets and the output of TokenStreams produced in this stage can 
usefully be cached and reused in the highlighting stage.
If ease of use is your concern I would suggest wrapping the highlighter 
functionality with a simpler (Analyzer based) interface rather than changing 
the internals of the highlighter implementation. That way more experienced 
users still have the option to use optimized alternatives in the underlying 
code.

Cheers,
Mark

Daniel Naber wrote:

Hi,
the Highlighter's getBestFragment method takes a TokenStream and a text. Wouldn't it be easier to give it just the text and an analyzer so the user doesn't have to care about building a TokenStream? Like this:
public final String getBestFragment(Analyzer analyzer, String text)
throws IOException
{
 TokenStream tokenStream = analyzer.tokenStream("field", new  
   StringReader(text));
 return getBestFragment(tokenStream, text);
}
The old method could then be deprecated. Or am I missing something? This would also avoid problems in case the stream doesn't match the text.
Regards
Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Highlighter API

Reply via email to