vivek sar wrote:
Thanks Mike for the insight. I did check the stdout log and found it
was complaining of not having enough disk space. I thought we need
only x2 of the index size. Our index size is 10G (max) and we had 45G
left on that parition - should it still complain of the space?
Is there a reader open on the index while optimize is running? That
ties up potentially another 1X.
Are you certain you're closing all previously open readers?
On Linux, because the semantics is "delete on last close", it's hard
to detect when you have IndexReaders still open because an "ls" won't
show the deleted files, yet, they are still consuming bytes on disk
until the last open file handle is closed. You can try running "lsof"
to see which files are held open, while optimize is running?
Also, if you can call IndexWriter.setInfoStream(...) for all of the
operations below, I can peak at it to try to see why it's using up so
much intermediate disk space.
Some comments/questions on other issues you raised,
We have 2 threads that index the data in two different indexes and
then we merge them into a master index with following call,
masterWriter.addIndexesNoOptimize(indices);
Once the smaller indices have merged into the master index we delete
the smaller indices.
This process runs every 5 minutes. Master Index can grow up to 10G
before we partition it - move it to other directory and start a new
master index.
Every hour we then optimize the master index using,
writer.optimize(optimizeSegment); //where optimizeSegment =
10
How long does that optimize take? And what do you do with the every-5-
minutes job while optimize is running? Do you run it, anyway, sharing
the same writer (ie you're calling addIndexesNoOptimize while another
thread is running the optimize)?
Here are my questions,
1) Is this process flawed in terms of performance and efficiency? What
would you recommend?
Actually I think your approach is the right approach.
2) When you say "partial optimize" what do you mean by that?
Actually, it's what you're already doing (passing 10 to optimize).
This means the index just has to reduce itself to <= 10 segments,
instead of the normal 1 segment for a full optimize.
Still I find that particular merge being done somewhat odd: it was
merging 7 segments, the first of which was immense, and the final 6
were tiny. It's not an efficient merge to do. Seeing the infoStream
output might help explain what led to that...
3) In Lucene 2.3 "segment merging is done in a background thread" -
how does it work, ie, how does it know which segments to merge? What
would cause this background merge exception?
The selection of segments to merge, and when, is done by the
LogByteSizeMergePolicy, which you can swap out for your own merge
policy (should not in general be necessary). Once a merge is
selected, the execution of that merge is controlled by
ConcurrentMergeScheduler, which runs merges in background threads.
You can also swap that out (eg for SerialMergeScheduler, which uses
the FG thread to merging, like Lucene used to before 2.3).
I think the background merge exception is often disk full, but in
general it can be anything that went wrong while merging. Such
exceptions won't corrupt your index because the merge only commits the
changes to the index if it completes successfully.
4) Can we turn off "background merge" if I'm running the optimize
every hour in any case? How do we turn it off?
Yes: IndexWriter.setMergeScheduler(new SerialMergeScheduler()) gets
you back to the old (fg thread) way of running merges. But in general
this gets you worse net performance, unless you are already using
multiple threads when adding documents.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]