I guess, you may search for alternative highlighters as contributions for Java Lucene. I used something 2 years which was faster ( required indexing with term vectors ) and highlighted phrase searches properly. As far as I know the most common highlighter doesn't do it right for phrase and any word from a phrase we searched for is highlighted. . As for your problem you may try stemming analyser when indexing but not sure whether it is relevant and going to help.
On Wed, Jun 24, 2009 at 4:36 PM, Nitin Shiralkar <[email protected]>wrote: > Hi All, > > We are trying to implement multi-color highlighting in our Lucene.NET > (v2.0) based search engine. We are using "Lucene.Net.Highlight" library for > the same. Since we do not have any support for multi-color highlighting, we > are doing that indirectly by extracting each term in search query and > highlighting it individually with separate formatter. > > For example: > > String strQuery = "merger agree*" (without quotes) > --- > WeightedTerm[] terms = QueryTermExtractor.GetTerms(strQuery, false); > --- > --- > SimpleHTMLFormatter formatter = new > SimpleHTMLFormatter(_strFormatterStartTag[nFormatter], _strFormatterEndTag); > --- loop to traverse each term --- > WeightedTerm term = terms[nTerm]; > --- > TermQuery termQuery = new TermQuery (new Term (FIELDNAME, term.GetTerm())); > --- > Highlighter highlighterContent = --- > > Problem: > > Above implementation is working fine. However all variations of "agree*" > query term like "agreements", "agreed", "agreement" are being highlighted in > separate color. I am not able to correlate all these variations to same > original term "agree*" to highlight them in same color. > > Can anyone suggest me an alternate approach? > > >
