Yeah, I should have mentioned - this was merely with a jar replacement, we haven't gotten around to doing fun 2.3-related stuff like making sure our domain-specific tokenizers use the next(Token), as well as making sure set all of our buffersizes by RAM used.
We tried multithreading the process, as we have a multi-core, multi-disk architecture, but for some reason we never saw more than 99% (of one core) cpu usage during indexing, as if some internal synchronization was getting hit... I should try it again through the profiler and see if I can pinpoint where it was getting tripped up. On the other hand, I'm not sure if we *need* faster than 26 minute indexing, so once we're sure we can move up to 2.3 for production, that may just solve our indexing perf issues. Now if I can just figure out how to speed up our query performance too, I'll be in an even *better* mood. :) -jake On Feb 3, 2008 2:11 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Awesome! We are glad to hear that :) > > You might be able to make it even faster with the steps here: > > http://wiki.apache.org/lucene-java/ImproveIndexingSpeed > > Mike > > Jake Mannix wrote: > > > Hello all, > > I know you lucene devs did a lot of work on indexing performance > > in 2.3, > > and I just tested it out last thursday, so I thought I'd let you > > know how it > > fared: > > > > On a 2.17 million document index, a recent test gave indexing > > time to be: > > > > * lucene 2.2: 4.83 hours > > * lucene 2.3: 26 minutes > > > > About a factor of 11 speedup. Holy smokes! Great work folks. > > > > > > -jake > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >