Has anyone done in benchmarking to approximate how long it takes to optimize different size indexes? Is the merging linear, sub-linear, etc.?

On Apr 8, 2007, at 1:01 AM, Otis Gospodnetic wrote:

I'd advise against calling optimize() at all in an environment whose indices are constantly updated. That's what mergeFactor helps with. Keep it low, and Lucene itself will regularly merge segments more often. If one still wants to call optimize(), you'd want to know how long it would take on with the index of your size and if you've got enough lull time, do it, otherwise postpone it.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: Grant Ingersoll <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Friday, April 6, 2007 6:53:13 PM
Subject: optimize() method call

I was looking at the javadocs for the optimize() call on IndexWriter
which contain a great amount of detail about what happens, but very
little guidance on when.  I would like to add more on when.  I
generally do optimize after I finish my indexing, which is pretty
straightforward to determine when one has a more or less static
collection.  What isn't so clear to me, b/c I haven't dealt w/ it too
much is when optimize should be called in environments that are
frequently updated.

Here's what I have for text so far:
*
    * <p>It is recommended that this method be called upon completion
of indexing.  In
    * environments with frequent updates optimize is best FILL IN HERE
    * </p>

Essentially, I am wondering what are the best practices for calling
optimize, especially in a frequent update environment.  My gut
feeling is that it should just be scheduled to be done on a regular
basis, ideally when there is a lull.  The docs allude to the fact
that search performance will be better, but has anyone quantified
it?  The mergeFactor docs say that a smaller merge factor results in
faster searches on unoptimized (I presume that means relatively
faster searches to higher merge factors, but still not as fast as
optimized, correct?)  If it hasn't been quantified, maybe I will try
to whip a benchmark for it.

So, do people in these types of environment typically schedule
optimize to occur at night or every few hours, or what?  I know, "It
depends...", just am wondering if there is a general consensus that
would be useful to pass along to readers

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to