Hi,

if you run Lucene as a service you want to be able to shut it down in a certain period of time (usually 1-2 mins). This can be a problem if the IndexWriter is in the middle of a merge when the service shutdown request is received.

Therefore it would be nice if we had a method in IndexWriter called e. g. shutdown() which satisfies the following two requirements:
- if a merge is happening, abort it
- flush the buffered docs but do not trigger a merge

The latter is easy: we just need a flush method that does not trigger a merge. That's a two line change in IndexWriter.

The former is more complex. The first way of implementing this that came to my mind was to add checks to the different merge loops, like "only continue if shutdown hasn't been called yet". The obvious drawback of this approach is a performance impact and the need to make code changes in different places: merging fields, merging postings, merging termvectors, writing compound files. So I think this is a quite ugly approach.

The approach I implemented is sort of a hack, but I'd like to describe it briefly here. I extended the FSDirectory and FSIndexOutput:

   public static class ExtendedFSDirectory extends FSDirectory {
       private boolean interrupted = false;
public void interrupt() {
           this.interrupted = true;
       }
public void clearInterrupt() {
           this.interrupted = false;
       }
public IndexOutput createOutput(String name) throws IOException {
           File file = new File(getFile(), name);
if (file.exists() && !file.delete()) // delete existing, if any
             throw new IOException("Cannot overwrite: " + file);

           return new FSIndexOutput(file) {
public void flushBuffer(byte[] b, int offset, int size) throws IOException {
                   if (ExtendedFSDirectory.this.interrupted) {
                       throw new IndexWriterInterruptException();
                   }
super.flushBuffer(b, offset, size);
               }

           };
       }
   }
// This exception is used to signal an interrupt request static final class IndexWriterInterruptException extends IOException {
       private static final long serialVersionUID = 1L;
   }

So now FSIndexOutput.flushBuffer() throws an IndexWriterInterruptException in case interrupt() has been called. This causes the IndexWriter to abort the merge and to rollback the transaction.

I have another class that extends IndexWriter and overwrites the addDocument() and updateDocument() methods. In these methods I catch the IndexWriterInterruptException. In case it is thrown IndexWriter.flushRamSegments(boolean triggerMerge) is called with triggerMerge=false. An advantage of this implementation is that almost all changes can be made on top of Lucene. The only core change is the protected method flushRamSegments(boolean triggerMerge) in IndexWriter.

My question is if people think that the shutdown feature is something we would like to add to the Lucene core? If yes, I can go ahead and attach my code to a JIRA issue, if no I'd like to make the small change to IndexWriter (add the protected method flushRamSegments(triggerMerge)). My approach seems to work quite well, but maybe others (e. g. the IndexWriter "experts") have different/better ideas how to implement it.

Thanks,
Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to