Re: amusing interaction between advanced tokenizers and highlighter

markharw00d Sat, 19 Jun 2004 11:46:09 -0700

A question before I dive into coding a fix: can I assume (for all analyzers) that the 
tokens produced by the tokenStream 
have the following property: 
   currentToken.startOffset() >= lastToken.startOffset()


The analyzers I have tested the highlighter with so far have the property:
   currentToken.startOffset() > lastToken.endOffset()
so aren't overlapping but I understand this isn't the case for others (all 
demonstrable examples of such "problem" analyzers 
would be appreciated for testing purposes).
If I can assume that tokenstreams always produce a zero or more increment in 
token.startOffset I think I can 
design a solution that still works using a single pass of the token stream.
I suspect an additional "flushText" method will be required on the Formatter interface 
to allow implementations
to use a buffer. This buffer would be required to accumulate overlapping token scores 
when trying to decide if a 
section of the original text merited any highlight markup.

Cheers
Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: amusing interaction between advanced tokenizers and highlighter

Reply via email to