"Samuru Jackson" <[EMAIL PROTECTED]> wrote on 27/02/2006
01:50:11 PM:
> Is there a way to retrieve a List of the matching words for a Hit?
> For example I create a query like this one:
> "Paris London -Stockholm"
> ...
> How do I know which words have been found in a document? In one it could
be
> Paris, in another it could be London or both!
> I would need this information in order to highlight those words if I
display
> the search results to the user.
For the purpose of highlighting, you don't necessarily need to know in
advance
which word matched: you can just highlight any occurance of either Paris or
London - wherever you find them - in the original text.
You might want to take a look at the Highlighter class in the contrib
directory
of Lucene's distribution, which might do what you want. Here is some
example
code: it creates a Highlighter object for highlighting the given query "q",
and then for each of the results, it retrieves the full content of the
document from the stored "storeadContent" field which I added to the index,
and finds the 2 most relevant sentences in the content and highlights q's
words (this is similar to the summaries you see in Google and its likes):
Highlighter highlighter = new Highlighter(new QueryScorer(q));
highlighter.setMaxDocBytesToAnalyze(ArbitraryLimits.DocumentToSaveCutOff);
for(... i iterates over the relevant hits...){
Document doc = hits.doc(i);
TokenStream tokenStream = analyzer.tokenStream("storedContent",
new StringReader(doc.get("storedContent")));
summary = highlighter.getBestFragments(tokenStream,
doc.get("storedContent"), 2, " ... ");
}
--
Nadav Har'El
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]