I'm not sure how the current Highlighter works - haven't had the time to look into it yet - but I thought about the following implementation. Judging by your question, this works in a slightly different way than the current Highlighter:
1. Build a Radix tree (PATRICIA) and populate it with all search terms. Phrase queries will be considered as one big string, regardless their spaces. 2. Iterate through your text ignoring spaces and punctuation marks, and for each word start a Radix lookup by letter, e.g. for the word John you will initialize a "tavel" upon the tree with j, then o, h, and so on, so if "John Kennedy" was your term it will not ignore the space (I'd say you'd want to keep only one instance of it and ignore the rest). On a dead end (no relevant branch on the tree) just skip to the next space or punctuation mark and restart the process. 3. If an iteration has touched base - completed a lookup successfully, the MyHighlighter object will boldify or surround the text with a span. It is then possible to give the span different IDs based on the term (each term a different ID) so each term will be highlighted with a different color. This allows for fast and exact highlighting of large texts as well as smaller ones. I would love to hear any comments on the above. Itamar. -----Original Message----- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 18, 2008 10:51 PM To: java-user@lucene.apache.org Subject: Re: Contrib Highlighter and Phrase search The contrib Highlighter is not position sensitive. You can try out the patch I have been working here if you are interested: https://issues.apache.org/jira/browse/LUCENE-794 Spencer Tickner wrote: > Hi List, > > Thanks in advance for any help. I'm working with the contrib > highlighting class and am having issues when doing searches with a > phrase. I've been able to duplicate this behaviour in the > HighlighterTest class. > > When calling the testGetBestFragmentsPhrase() method I get the correct: > > <b>John</b> <b>Kennedy</b> has been shot > > However when I add "John" to the text so it reads: > > "John Kennedy has been John shot" > > I get: > > <b>John</b> <b>Kennedy</b> has been<b>John</b> shot > > It makes sense to me that John should not be highlighted. Has anyone > else run into this problem? Anyone have a fix? > > Cheers, > > Spencer > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]