I have a very large corpus that I am storing in many indexes: 200 indexes * ~500MB each, with 10^6 very tiny documents in each. (I could look into optimizing this later, of course, but seems ok for now)
During indexing, I have been using a RAMDirectory to store many thousands of documents in memory before flushing the buffer to disk using IndexWriter.addIndexes. For the most part this works very well, except that performance degrades tremendously over time due to the implicit call (or two!) to optimize() inside the addIndexes function. I searched the archives and found that this topic has come up a number of times over the years, but with very few answers (except for one amusing one from Doug: "I don't recall exactly why this was done. (I should have written a comment!)" :) Is there a way to accomplish what I'm trying to do without all the calls to optimize()? -Ben