Re: optimize() method call

Grant Ingersoll Wed, 18 Apr 2007 13:29:54 -0700

Has anyone done in benchmarking to approximate how long it takes tooptimize different size indexes? Is the merging linear, sub-linear,etc.?


On Apr 8, 2007, at 1:01 AM, Otis Gospodnetic wrote:

I'd advise against calling optimize() at all in an environmentwhose indices are constantly updated. That's what mergeFactorhelps with. Keep it low, and Lucene itself will regularly mergesegments more often. If one still wants to call optimize(), you'dwant to know how long it would take on with the index of your sizeand if you've got enough lull time, do it, otherwise postpone it.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: Grant Ingersoll <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Friday, April 6, 2007 6:53:13 PM
Subject: optimize() method call

I was looking at the javadocs for the optimize() call on IndexWriter
which contain a great amount of detail about what happens, but very
little guidance on when.  I would like to add more on when.  I
generally do optimize after I finish my indexing, which is pretty
straightforward to determine when one has a more or less static
collection.  What isn't so clear to me, b/c I haven't dealt w/ it too
much is when optimize should be called in environments that are
frequently updated.

Here's what I have for text so far:
*
    * <p>It is recommended that this method be called upon completion
of indexing.  In
    * environments with frequent updates optimize is best FILL IN HERE
    * </p>

Essentially, I am wondering what are the best practices for calling
optimize, especially in a frequent update environment.  My gut
feeling is that it should just be scheduled to be done on a regular
basis, ideally when there is a lull.  The docs allude to the fact
that search performance will be better, but has anyone quantified
it?  The mergeFactor docs say that a smaller merge factor results in
faster searches on unoptimized (I presume that means relatively
faster searches to higher merge factors, but still not as fast as
optimized, correct?)  If it hasn't been quantified, maybe I will try
to whip a benchmark for it.

So, do people in these types of environment typically schedule
optimize to occur at night or every few hours, or what?  I know, "It
depends...", just am wondering if there is a general consensus that
would be useful to pass along to readers

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: optimize() method call

Reply via email to