Let the numbers speak, INDEX SIZE: 58Mio docs, 2.5G on disk - two tokenized Fields, both with average 4 tokens (rather small), approx. 2Mio unique tokens - one binary stored field (one VInt) - HW commodity AMD PC, 2.8Ghz (or so) 2G RAM, single disk, WIN XP 64bit, jvm 6.0 32bit
before LUCENE-843 indexing speed was 5-6k records per second (and I believed this was already as fast as it gets) after (trunk version yesterday) 60-65k documents per second! All (exhaustive!) tests pass on this index. autocommit = false, 24M RAMBuffer, using char[] instead of String for Token (this was the reason we separated Analysis in two phases, leaving for Lucene Analyzer only simple whitespace tokenization) Brilliant work, nothing more and nothing less! ___________________________________________________________ Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]