On Mon, Dec 16, 2013 at 7:33 AM, Ravikumar Govindarajan <ravikumar.govindara...@gmail.com> wrote: > I am trying to model a transaction-log for lucene, which creates a > transaction-log per-commit > > Things work fine during normal operations, but I cannot fathom the effect > during > > a. IOException during Index-Commit > > Will the index be restored to previous commit-point? Can I blindly re-try > operations from the current transaction log, after some time interval?
Yes: if an IOException is thrown from IndexWriter.commit then the commit failed and the index still "shows" the previous successful commit. > b. IOException during Background-Flush > > Will all the RAM buffers including deletes for that DWPT be cleaned up? > flush() being per-thread and async obviously has problems with my > transaction-log-per-commit approach, right? > > Most of the time, the IOExceptions are temporary and recoverable [Ex: > Solr's HDFSDirectory etc...]. So, I must definitely retry these operations > after some time-interval. IOExceptions during flush are trickier. Often it will mean all documents assigned to that segment are lost, but not necessarily (e.g. if the IOE happened while creating a compound file). IOExceptions during add/updateDocument are also possible (e.g. we write stored fields, term vectors per-doc), which can result in losing all documents in that one segment as well (an aborting exception), but e.g. an IOE thrown by the analyzer, will just result in that one document being lost (a non-aborting exception). Since you cannot know which case it was, it's probably safest to define a primary key field, and always use IW.updateDocument. This way if the document was in fact not lost, and you re-index it, you just replace it, instead of creating a duplicate. Mike McCandless http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org