I'd advise against calling optimize() at all in an environment whose indices are constantly updated. That's what mergeFactor helps with. Keep it low, and Lucene itself will regularly merge segments more often. If one still wants to call optimize(), you'd want to know how long it would take on with the index of your size and if you've got enough lull time, do it, otherwise postpone it.
Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share ----- Original Message ---- From: Grant Ingersoll <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Friday, April 6, 2007 6:53:13 PM Subject: optimize() method call I was looking at the javadocs for the optimize() call on IndexWriter which contain a great amount of detail about what happens, but very little guidance on when. I would like to add more on when. I generally do optimize after I finish my indexing, which is pretty straightforward to determine when one has a more or less static collection. What isn't so clear to me, b/c I haven't dealt w/ it too much is when optimize should be called in environments that are frequently updated. Here's what I have for text so far: * * <p>It is recommended that this method be called upon completion of indexing. In * environments with frequent updates optimize is best FILL IN HERE * </p> Essentially, I am wondering what are the best practices for calling optimize, especially in a frequent update environment. My gut feeling is that it should just be scheduled to be done on a regular basis, ideally when there is a lull. The docs allude to the fact that search performance will be better, but has anyone quantified it? The mergeFactor docs say that a smaller merge factor results in faster searches on unoptimized (I presume that means relatively faster searches to higher merge factors, but still not as fast as optimized, correct?) If it hasn't been quantified, maybe I will try to whip a benchmark for it. So, do people in these types of environment typically schedule optimize to occur at night or every few hours, or what? I know, "It depends...", just am wondering if there is a general consensus that would be useful to pass along to readers -------------------------- Grant Ingersoll Center for Natural Language Processing http://www.cnlp.org Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ LuceneFAQ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]