>>Folks have benchmarked this, and, for documents less than 10k characters or so, 
>>re-tokenizing is fast enough.

As a note of warning: I did find StandardTokenizer to be the major culprit in my 
tokenizing benchmarks (avg 75ms for 16k sized docs).
I have found I can live without StandardTokenizer in my apps.

>> The simplest is to not scan past the first 10k or so for snippets 
A maximum number of tokens will be a new feature in the new highlighter.

Cheers
Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to