Some quick and completely unscientific benchmarks, indexing 1000 times the same 10K ASCII document:

RT = RegexTokenizer
ST = StandardTokenizer
CF = CaseFolder
N  = Normalizer

RT:    2.177s
RT+CF: 3.964s
RT+N:  2.556s
ST:    1.551s
ST+CF: 3.357s
ST+N:  1.931s

It's also interesting that moving the tokenizer in front of the case folder or normalizer always gave me faster results.

Nick

Reply via email to