Re: 2.3.2 Indexing Performance

Michael McCandless Wed, 01 Oct 2008 12:17:35 -0700


Awesome!  Thanks for following up.


Mike

Gary Moore wrote:

Finally got back to this. The great bulk of the time is spentparsing/tokenizing. So, using 10 threads parsing/analyzing the 4.5Mdocs and feeding them to an IndexWriter took 106 minutes including afinal optimization. The index is 5.6 GB. I'm tempted to trymultiple indexing threads but my guess is it won't buy that muchsince the async writer more than kept up with the thread queue.
Now, I'm even more impressed with 2.3!
-Gary
Michael McCandless wrote:
Thanks for the data point!
This is expected -- alot of work went into increasing IndexWriter'sthroughput in 2.3.
Actually, I'd expect even more speedup, if indeed Lucene is thebottleneck in your app. You could test how much time just creating/parsing & tokenizing the docs (from whatever is holding them)takes, to see. Also you might eke more performance out followingthe suggestions here:
   http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
Since you've got 4 CPUs and lots of RAM you should definitely usemultiple indexing threads with a large RAM buffer.
Mike

Gary Moore wrote:
Parsing and indexing 4.5 million MARC/XML bibliographic recordswas requiring ~14 hrs. using 2.2. The same job using 2.3 takes ~5 hrs. on the same platform -- a quad processor Sun V440 w/8GBmemory. I'm using the PerFieldAnalyzerWrapper (StandardAnalyzerand SnowballAnalyzer).
I'm impressed!  Is this typical?
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: 2.3.2 Indexing Performance

Reply via email to