[
https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099012#comment-13099012
]
Michael McCandless commented on LUCENE-3418:
--------------------------------------------
bq. Mike, just to make sure, did you actually see this leads to corruption, or
you only suspect it?
Mark and I saw real corruption just by pulling the power plug. Once
the machine came back up there were a bunch of 0-length files, and the
index was quite definitely corrupt ;) It was trivial to reproduce. In
one test, I watched 5 commits complete before cutting power and on
reboot none of those commits were usable.
But with the fix I committed I can no longer corrupt the index by
pulling the plug.
On LUCENE-3237: it still makes me somewhat nervous that we close the
fd, time passes, open a new fd, fsync that. It would be "safer" if we
fsync'd before closing, but this would be a challenge for Lucene.
But the most recent POSIX standard (POSIX:2008) seem to suggest this
is an OK approach:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/fsync.html
Ie, if the system has _POSIX_SYNCHRONIZED_IO defined (I believe modern
Linuxes do) then the semantics make it clear that the fsync applies to
the underlying file and not just the bytes written to the fd you have
open right now.
> Lucene is not fsync'ing files on commit
> ---------------------------------------
>
> Key: LUCENE-3418
> URL: https://issues.apache.org/jira/browse/LUCENE-3418
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/store
> Affects Versions: 3.1, 3.2, 3.3, 3.4, 4.0
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Blocker
> Fix For: 3.4, 4.0
>
>
> Thanks to hurricane Irene, when Mark's electricity became unreliable, he
> discovered that on power loss Lucene could easily corrumpt the index, which
> of course should never happen...
> I was able to easily repro, by pulling the plug on an Ubuntu box during
> indexing. On digging, I discovered, to my horror, that Lucene is failing to
> fsync any files, ever!
> This bug was unfortunately created when we committed LUCENE-2328... that
> issue added tracking, in FSDir, of which files have been closed but not
> sync'd, so that when sync is called during IW.commit we only sync those files
> that haven't already been sync'd.
> That tracking is done via the FSDir.onIndexOutputClosed callback, called when
> an FSIndexOutput is closed. The bug is that we only call it on exception
> during close:
> {noformat}
> @Override
> public void close() throws IOException {
> // only close the file if it has not been closed yet
> if (isOpen) {
> boolean success = false;
> try {
> super.close();
> success = true;
> } finally {
> isOpen = false;
> if (!success) {
> try {
> file.close();
> parent.onIndexOutputClosed(this);
> } catch (Throwable t) {
> // Suppress so we don't mask original exception
> }
> } else
> file.close();
> }
> }
> }
> {noformat}
> And so FSDir thinks no files need syncing when its sync method is called....
> I think instead we should call it up-front; better to over-sync than
> under-sync.
> The fix is trivial (move the callback up-front), but I'd love to somehow have
> a test that can catch such a bad regression in the future.... still I think
> we can do that test separately and commit this fix first.
> Note that even though LUCENE-2328 was backported to 2.9.x and 3.0.x, this bug
> wasn't, ie the backport was a much simpler fix (to just address the original
> memory leak); it's 3.1, 3.2, 3.3 and trunk when this bug is present.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]