On Jul 5, 2005, at 4:58 PM, Terence Lai wrote:
I am currently using Lucene 1.4.2 with the highighter downloaded
from Lucene In Action.
The Highlighter class provides the following method to highlight
the terms specified in the Query:
/**
* Highlights chosen terms in a text, extracting the most relevant
section.
* The document text is analysed in chunks to record hit statistics
* across the document. After accumulating stats, the fragment with
the highest score
* is returned
*
* @param tokenStream a stream of tokens identified in the text
parameter, including offset information.
* This is typically produced by an analyzer re-parsing a document's
* text. Some work may be done on retrieving TokenStreams more
efficently
* by adding support for storing original text position data in the
Lucene
* index but this support is not currently available (as of Lucene
1.4 rc2).
* @param text text to highlight terms in
*
* @return highlighted text fragment or null if no terms found
*/
public final String getBestFragment(TokenStream tokenStream, String
text)
throws IOException;
According to the javadoc, this method only returns the most
relevant section of the text. Is there any way or method to return
ENTIRED text with the terms being highlighted?
Yes - it relies on the Fragmenter. For lucenebook.com, for example,
if a search result is for a blog entry, the entire contents are
highlighted using a NullFragmenter:
package lia.web;
import org.apache.lucene.search.highlight.Fragmenter;
import org.apache.lucene.analysis.Token;
public class NullFragmenter implements Fragmenter {
public void start(String s) {
}
public boolean isNewFragment(Token token) {
return false;
}
}
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]