"Peter Keegan" <[EMAIL PROTECTED]> wrote: > I did some performance comparison testing of Lucene 2.0 vs. trunk (with > LUCENE-843). I'm seeing at least a 4X increase in indexing rate with the new > DocumentsWriter in LUCENE-843 (still doing single-threaded indexing). Better > yet, the total time to build the index is much shorter because I can now > build the entire 3GB index (900K docs) in one segment in RAM (using > FSDirectory) and flush it to disk at the end. Before, I had to build smaller > segments (20K docs), merge after 20 segments and then optimize at the end.
Awesome :) > The memory usage with LUCENE-843 is much lower, presumably because stored > fields and term vectors no longer sit in RAM. Right, not buffering the stored fields & term vectors in RAM is a big win. In addition, the storage of the postings in RAM as a single shared hash table using a pool of large byte[] arrays vs separate 1 KB buffers for the files for a single segment document, also improve RAM efficiency. In my tests, using Europarl content with small docs (~100 terms = ~550 bytes per doc) with stored fields & term vectors enabled the RAM efficiency is 44X better than before. > I also observed a 20-25% gain by reusing the Field objects. Implementing my > own Fieldable class was too complicated, so I simply extended the Field > class (after removing final) and added 2 setter methods: > > public void setValue(String value) { > this.fieldsData = value; > } > public void setValue(byte[] value) { > this.fieldsData = value; > } > > Since this improved performance significantly, I would vote to either add > setters to Field or make it extendable. OK I've opened LUCENE-963 for this & attached a patch. > Kudos to Mike for this huge improvement! Thanks! Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]