It's odd to have a ~500X difference in writes versus reads. Are you sure? Is it possible you are also opening IndexReaders and searching the commit points?
Lucene does re-read previously written (already indexed) documents during segment merges. But at default settings (as long as you did not change merge settings in IndexWriterConfig) this read/write amplification should be log(N) in cost, i.e. maybe ~5X at worst, not ~500X. More comments inlined below: On Wed, Sep 4, 2024 at 7:07 AM Gopal Sharma <gopal.sha...@algonomy.com> wrote: In my use case i am committing every 100k records (because in my test > scenarios committing per million was taking a lot of time) > Setting as large an IndexWriterConfig.setRAMBufferSizeMB as possible, and committing as rarely as possible, should minimize overall IO (lower read/write amplification due to merging). > Below is the snippet on how i am instantiating lucene writter > > FSDirectory indexDirectory = NIOFSDirectory.open(indexDir.toPath()); > Have you tried MMapDirectory instead? NIOFSDirectory does buffered reads... so I wonder if your odd "read amplification" is somehow caused by that? That would not be good -- it's a performance bug in NIOFSDirectory, if so. I'd be curious whether this fixes (works around) your ~500X read amplification during indexing. Still, using MMapDirectory just foists the problem (which pages to read/cache) onto the OS. But Lucene's reads should be largely sequential, so it ought to be easy for the OS to cache/readahead well and not burn too much read IO from EFS. Mike McCandless http://blog.mikemccandless.com