>>Folks have benchmarked this, and, for documents less than 10k characters or so, >>re-tokenizing is fast enough.
As a note of warning: I did find StandardTokenizer to be the major culprit in my tokenizing benchmarks (avg 75ms for 16k sized docs). I have found I can live without StandardTokenizer in my apps. >> The simplest is to not scan past the first 10k or so for snippets A maximum number of tokens will be a new feature in the new highlighter. Cheers Mark --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
