Hi All,

An application of ours under development had a memory link that caused
it to slow interminably.  On linux, the application did not response to
kill -15 in a reasonable time, so kill -9 was used to forcibly terminate
it.  After this the segments file contained a reference to a segment
whose index files were not present.  I.e., the index was corrupt and
Lucene could not open it.

A thread dump at the time of the kill -9 shows that Lucene was merging
segments inside IndexWriter.close().  Since segment merging only commits
(updates the segments file) after the newly merged segment(s) are
complete, I expect this is not the actual problem.

Could a kill -9 prevent data from reaching disk for files that were
previously closed?  If so, then Lucene's index can become corrupt after
kill -9.  In this case, it is possible that a prior merge created new
segment index files, updated the segments file, closed everything, the
segments file made it to disk, but the index data files and/or their
directory entries did not.

If this is the case, it seems to me that flush() and
FileDescriptor.sync() are required on each index file prior to close()
to guarantee no corruption.  Additionally a FileDescriptor.sync() is
also probably required on the index directory to ensure the directory
entries have been persisted.

A power failure or other operating system crash could cause this, not
just kill -9.

Does this seem like a possible explanation and fix for what happened? 
Could the same kind of problem happen on Windows?

If this is the issue, then how would people feel about having Lucene do
sync()'s a) always? or b) as an index configuration option?

I need to fix whatever happened and so would submit a patch to resolve it.

Thanks for advice and suggestions,

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to