It's odd to have a ~500X difference in writes versus reads.  Are you sure?
Is it possible you are also opening IndexReaders and searching the commit
points?

Lucene does re-read previously written (already indexed) documents during
segment merges.  But at default settings (as long as you did not change
merge settings in IndexWriterConfig) this read/write amplification should
be log(N) in cost, i.e. maybe ~5X at worst, not ~500X.

More comments inlined below:

On Wed, Sep 4, 2024 at 7:07 AM Gopal Sharma <gopal.sha...@algonomy.com>
wrote:

In my use case i am committing every 100k records (because in my test
> scenarios committing per million was taking a lot of time)
>

Setting as large an IndexWriterConfig.setRAMBufferSizeMB as possible, and
committing as rarely as possible, should minimize overall IO (lower
read/write amplification due to merging).


> Below is the snippet on how i am instantiating lucene writter
>
> FSDirectory indexDirectory = NIOFSDirectory.open(indexDir.toPath());
>

Have you tried MMapDirectory instead?  NIOFSDirectory does buffered
reads... so I wonder if your odd "read amplification" is somehow caused by
that?  That would not be good -- it's a performance bug in NIOFSDirectory,
if so.  I'd be curious whether this fixes (works around) your ~500X read
amplification during indexing.

Still, using MMapDirectory just foists the problem (which pages to
read/cache) onto the OS.  But Lucene's reads should be largely sequential,
so it ought to be easy for the OS to cache/readahead well and not burn too
much read IO from EFS.

Mike McCandless

http://blog.mikemccandless.com

Reply via email to