Hi,
if you run Lucene as a service you want to be able to shut it down in a
certain period of time (usually 1-2 mins). This can be a problem if the
IndexWriter is in the middle of a merge when the service shutdown
request is received.
Therefore it would be nice if we had a method in IndexWriter called e.
g. shutdown() which satisfies the following two requirements:
- if a merge is happening, abort it
- flush the buffered docs but do not trigger a merge
The latter is easy: we just need a flush method that does not trigger a
merge. That's a two line change in IndexWriter.
The former is more complex. The first way of implementing this that came
to my mind was to add checks to the different merge loops, like "only
continue if shutdown hasn't been called yet". The obvious drawback of
this approach is a performance impact and the need to make code changes
in different places: merging fields, merging postings, merging
termvectors, writing compound files. So I think this is a quite ugly
approach.
The approach I implemented is sort of a hack, but I'd like to describe
it briefly here. I extended the FSDirectory and FSIndexOutput:
public static class ExtendedFSDirectory extends FSDirectory {
private boolean interrupted = false;
public void interrupt() {
this.interrupted = true;
}
public void clearInterrupt() {
this.interrupted = false;
}
public IndexOutput createOutput(String name) throws IOException {
File file = new File(getFile(), name);
if (file.exists() && !file.delete()) // delete
existing, if any
throw new IOException("Cannot overwrite: " + file);
return new FSIndexOutput(file) {
public void flushBuffer(byte[] b, int offset, int size)
throws IOException {
if (ExtendedFSDirectory.this.interrupted) {
throw new IndexWriterInterruptException();
}
super.flushBuffer(b, offset, size);
}
};
}
}
// This exception is used to signal an interrupt request
static final class IndexWriterInterruptException extends IOException {
private static final long serialVersionUID = 1L;
}
So now FSIndexOutput.flushBuffer() throws an
IndexWriterInterruptException in case interrupt() has been called. This
causes the IndexWriter to abort the merge and to rollback the transaction.
I have another class that extends IndexWriter and overwrites the
addDocument() and updateDocument() methods. In these methods I catch the
IndexWriterInterruptException. In case it is thrown
IndexWriter.flushRamSegments(boolean triggerMerge) is called with
triggerMerge=false.
An advantage of this implementation is that almost all changes can be
made on top of Lucene. The only core change is the protected method
flushRamSegments(boolean triggerMerge) in IndexWriter.
My question is if people think that the shutdown feature is something we
would like to add to the Lucene core? If yes, I can go ahead and attach
my code to a JIRA issue, if no I'd like to make the small change to
IndexWriter (add the protected method flushRamSegments(triggerMerge)).
My approach seems to work quite well, but maybe others (e. g. the
IndexWriter "experts") have different/better ideas how to implement it.
Thanks,
Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]