"Peter Keegan" <[EMAIL PROTECTED]> wrote:
> I did some performance comparison testing of Lucene 2.0 vs. trunk (with
> LUCENE-843). I'm seeing at least a 4X increase in indexing rate with the new
> DocumentsWriter in LUCENE-843 (still doing single-threaded indexing). Better
> yet, the total time to build the index is much shorter because I can now
> build the entire 3GB index (900K docs) in one segment in RAM (using
> FSDirectory) and flush it to disk at the end. Before, I had to build smaller
> segments (20K docs), merge after 20 segments and then optimize at the end.

Awesome :)

> The memory usage with LUCENE-843 is much lower, presumably because stored
> fields and term vectors no longer sit in RAM.

Right, not buffering the stored fields & term vectors in RAM is a big
win.  In addition, the storage of the postings in RAM as a single shared
hash table using a pool of large byte[] arrays vs separate 1 KB
buffers for the files for a single segment document, also improve RAM
efficiency.

In my tests, using Europarl content with small docs (~100 terms = ~550
bytes per doc) with stored fields & term vectors enabled the RAM
efficiency is 44X better than before.

> I also observed a 20-25% gain by reusing the Field objects. Implementing my
> own Fieldable class was too complicated, so I simply extended the Field
> class (after removing final) and added 2 setter methods:
> 
>       public void setValue(String value) {
>         this.fieldsData = value;
>       }
>       public void setValue(byte[] value) {
>         this.fieldsData = value;
>       }
> 
> Since this improved performance significantly, I would vote to either add
> setters to Field or make it extendable.

OK I've opened LUCENE-963 for this & attached a patch.

> Kudos to Mike for this huge improvement!

Thanks!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to