the Highlighter's getBestFragment method takes a TokenStream and a text. Wouldn't it be easier to give it just the text and an analyzer

That's how it was originally coded. The move to TokenStream was a deliberate choice, made in order to decouple the highlighter from the source of tokens and enable alternatives. Re-analyzing document text with an Analyzer is one (potentially costly) way of getting Tokens. Another is to use the new TermVector support (see TokenSources.java in the highlighter package). In my apps I have query processing stages which use TokenStreams to extract themes from result sets and the output of TokenStreams produced in this stage can usefully be cached and reused in the highlighting stage. If ease of use is your concern I would suggest wrapping the highlighter functionality with a simpler (Analyzer based) interface rather than changing the internals of the highlighter implementation. That way more experienced users still have the option to use optimized alternatives in the underlying code.

Cheers,
Mark



Daniel Naber wrote:

Hi,

the Highlighter's getBestFragment method takes a TokenStream and a text. Wouldn't it be easier to give it just the text and an analyzer so the user doesn't have to care about building a TokenStream? Like this:

public final String getBestFragment(Analyzer analyzer, String text)
throws IOException
{
 TokenStream tokenStream = analyzer.tokenStream("field", new  
   StringReader(text));
 return getBestFragment(tokenStream, text);
}

The old method could then be deprecated. Or am I missing something? This would also avoid problems in case the stream doesn't match the text.

Regards
Daniel







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to