Steven Parkes wrote:
I'm not certain, but would parts of your goal be achieved by the work
i've
seen floating arround Jira to refactor th MergePolicy so that it can
be
handled by multiple thrads?

Well, in what I've been working on for LUCENE-847 (merge policy
factoring) and LUCENE-870 (concurrent merge policy), what Michael's
talking about really wouldn't be affected.

The way I envision factoring the merge policy, the policy doesn't get
involved in the actual merge itself. It simply defines what merges will
occur. (This makes the merge policy variants very clean and gets them
out of the segment merging which is a bit tricky.) So since Michael is
asking for a way to abort an in-flight merge, the merge policy really
doesn't get involved.

Exactly. The merge policy decides *when* to merge. For the shutdown feature however we want to be able to stop an ongoing merge.

(Well, it does a little: the merge policy will in
general generate from the abstract merge or optimize request, a sequence
of individual merges, each generating a new segment, so it could check
between individual merge operations. However, since a single merge
operation of large segments can take a long time, this isn't sufficient
to bound the time.)

Yes, we could do this already with the current merge policy in IndexWriter, but you are right, a single merge operation can already take too long.

I thought about this when the commit/rollback stuff got added to
IndexWriter. At that point, all it would take to get an immediate abort
would be to convince the bottom writer to throw an I/O Exception, which
it looks like is effectively what Michael is talking about, at least for
the FSDirectory case.

So my thoughts:

I think something like what Michael has suggested is a good idea, but I
would be in favor of putting it in the core, rather than making it a
derived thing for a single Directory implementation. Seems to me like
it's a pretty small code change for a very nice thing to have. Doesn't
seem to add much complexity.

Okay, it seems that this is a desired feature, so I will go ahead and open a Jira issue. I will attach the code that I have so far, even though it extends IndexWriter and FSDirectory and lacks test cases.

As to what happens in the middle of a merge or optimize: I think it
might depend on the autoCommit flag.

In either case we have to ensure that the buffered docs get flushed to disk.

Since an optimize may be done in
stages, whether the intermediary stages are kept or not is going to
depend on when the segments file gets updated (and I haven't checked the
current status of this.) I can see it either way: keeping partial work
(to resume) or throwing everything away on a shutdown.

Good idea. I'm not too familiar with the new autoCommit code yet. I implemented the shutdown code before autoCommit was added. Will look into that...

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to