730 msecs is the correct number for 10 * 16k docs with StandardTokenizer! The 11ms per doc figure in my post was for highlighlighting using a \Looking at StandardTokenizer I can't see anything that would slow it down much... can we get the source to your lower case fitler?!
lower-case-filter-only analyzer. 5ms of this figure was the cost of the \
lower-case-filter-only analyzer.
73 msecs is the cost of JUST StandardTokenizer (no highlighting) StandardAnalyzer uses StandardTokenizer so is probably used in a lot of apps. It \ tries to keep certain text eg email addresses as one term. I can live without it and \ I suspect most apps can too. I haven't looked into why its slow but I notice it does \ make use of Vectors. I think a lot of people's highlighter performance issues may \ extend from this.
Kevin
--
Please reply using PGP.
http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster
signature.asc
Description: OpenPGP digital signature