the Highlighter's getBestFragment method takes a TokenStream and a text.
Wouldn't it be easier to give it just the text and an analyzer
That's how it was originally coded. The move to TokenStream was a
deliberate choice, made in order to decouple the highlighter from the source of
tokens and enable alternatives. Re-analyzing document text with an Analyzer is
one (potentially costly) way of getting Tokens. Another is to use the new
TermVector support (see TokenSources.java in the highlighter package). In my
apps I have query processing stages which use TokenStreams to extract themes
from result sets and the output of TokenStreams produced in this stage can
usefully be cached and reused in the highlighting stage.
If ease of use is your concern I would suggest wrapping the highlighter
functionality with a simpler (Analyzer based) interface rather than changing
the internals of the highlighter implementation. That way more experienced
users still have the option to use optimized alternatives in the underlying
code.
Cheers,
Mark
Daniel Naber wrote:
Hi,
the Highlighter's getBestFragment method takes a TokenStream and a text.
Wouldn't it be easier to give it just the text and an analyzer so the user
doesn't have to care about building a TokenStream? Like this:
public final String getBestFragment(Analyzer analyzer, String text)
throws IOException
{
TokenStream tokenStream = analyzer.tokenStream("field", new
StringReader(text));
return getBestFragment(tokenStream, text);
}
The old method could then be deprecated. Or am I missing something? This
would also avoid problems in case the stream doesn't match the text.
Regards
Daniel
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]