Hi,

For a certain application, I want Lucene to be tolerant to computer
crashes.
This means that if the indexing process gets killed, the index should never
become corrupt, unusable, or loose large chunks of the indexed files. I am
fine with losing a few documents, such as documents that were indexed in
memory
but not yet written to disk - but not to loose the entire index, or even
entire
segments.

If there are places where Lucene is not crash tolerant (in the above
sense),
I would like to spend some time fixing these issues, and then send patches
for
them. I was wondering if anyone is aware of such issues? I did not see any
such issues open on the JIRA.

A little research did by a colleague of mine already turned up one such
bug, which will I'll explain now, and if there's interest, I'll send
a patch. Is this a known issue?

The problem we noticed is how a new segments file is written, e.g. after a
merge, in SegmentInfos.write(). The code writes a new segments file,
"segments.new", and when this writing is complete, it uses
directory.renameFile() to overwrite the old segments file with the new one.
This could have been fine if directory.renameFile() was atomic - as its
javadoc even suggests - but unfortunately, it is not actually atomic:
FSDirectory.renameFile() first deletes the old file and then moves the
new file into place. Even a comment in FSDirectory.renameFile() warns
that it is *not* atomic. So, Lucene might crash just after deleting the
"segments" file but before "segments.new" was renamed or completely
copied onto it, and the entire index becomes unusable.

The change we propose is, basically, when reading the segments file,
if we cannot read the "segments" file correctly, we should fall back
to reading the "segments.new" file and recovering from it.
And of course, the "This replacement should be atomic" comment in
Directory.renameFile() must be revised.

If what I'm saying sounds logical, I'll open a JIRA entry and
propose a patch.

Is anyone aware of other crash in-tolerance issues in Lucene
that I should consider working on?

Thanks,
Nadav.

--
Nadav Har'El
IBM Haifa Research Lab


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to