A question before I dive into coding a fix: can I assume (for all analyzers) that the tokens produced by the tokenStream have the following property: currentToken.startOffset() >= lastToken.startOffset()
The analyzers I have tested the highlighter with so far have the property: currentToken.startOffset() > lastToken.endOffset() so aren't overlapping but I understand this isn't the case for others (all demonstrable examples of such "problem" analyzers would be appreciated for testing purposes). If I can assume that tokenstreams always produce a zero or more increment in token.startOffset I think I can design a solution that still works using a single pass of the token stream. I suspect an additional "flushText" method will be required on the Formatter interface to allow implementations to use a buffer. This buffer would be required to accumulate overlapping token scores when trying to decide if a section of the original text merited any highlight markup. Cheers Mark --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
