Re: Performance of hit highlighting and finding term positions for

markharw00d Wed, 31 Mar 2004 10:04:45 -0800

>>Folks have benchmarked this, and, for documents less than 10k characters or so, 
>>re-tokenizing is fast enough.


As a note of warning: I did find StandardTokenizer to be the major culprit in my 
tokenizing benchmarks (avg 75ms for 16k sized docs).
I have found I can live without StandardTokenizer in my apps.

>> The simplest is to not scan past the first 10k or so for snippets 
A maximum number of tokens will be a new feature in the new highlighter.

Cheers
Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Performance of hit highlighting and finding term positions for

Reply via email to