Hi Daniel/Chris,

Unfortunately, the contrib/highlighter code in source control fails to meet our needs in two ways:

  1. We don't just want fragments, we want *all* of the text, with
     highlights in the appropriate places (although we do offer a means
     to display just the fragments as well), and

Pass a "NullFragmenter" to the highlighter constructor to turn off fragmentation.


  2. We don't deal with HTML, just plain text on a Swing text component.
     In other words we don't have to "format" or modify the text at all,
     just tell the Swing component which bits need to be highlighted.


Swing supports HTML and will do the highlight for you.
SwingText="<html>"+highlighter.getBestFragment(tokenStream,text)+"</html>";

If you don't like that approach and really do just want to just know the positions, plug in your own "Formatter" class which, instead of marking up the text, silently records the hit position information provided to it in the "TokenGroup" class and then return the original string without adding any markup. TokenGroup handles the issue of identifying runs of overlapping tokens for you.


Hoss, your psuedo code looked like a solution for identifying phrase queries. Lack of proper support for phrase queries is a known issue with the current highlighter but I thought the primary issue in question here was speed? The approach taken by the current highlighter is to maintain a HashSet of all unique query terms and check each token in the text's token stream for a hit on this set. As your code suggests, this could be made faster if there were multiple queries all of which were PhraseQueries (with no slop factor!) because you would only need to check each phrase's "first terms" initially. Not sure this helps for non-phrase queries. Also, I don't think hitting the index to work out what terms were a hit for the doc in question in order to shorten the list of terms to highlight is likely to speed up things. If anything, the extra disk IO is likely to slow it down. With regards to the quesiton of overlapping tokens - the highlighter is robust in the face of marking these up.

Cheers
Mark




        
        
                
___________________________________________________________ Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to