On Mon, Mar 30, 2015 at 8:30 PM, Peter Karman <[email protected]> wrote:
> I too suspected full disk at first, but the disk was only 50% full so that
> was not it.

Well, that can depend on the size of the index relative to the size of the
disk.  The worst case is that you need ~3x the index size during a full
consolidation down to a single segment: the existing index, temp files, and
the final rewritten version.

However, running up against a full disk should not corrupt the index.  We
check the success of both write() and close() operations on the Unix file
descriptor and throw an exception on failure.  The atomic commit event is a
call to link() which hard links the new snapshot file from a temp file.  Prior
to that event, all OutStream objects must have been flushed and closed,
potentially triggering an exception.  So it ought to be impossible to get to
the commit event if a full disk has caused write operations to fail
for any segment file, schema file, or even the snapshot file itself.

> The snapshot was also zero.

In that case, I would suspect a system level event -- power failure. OS
glitch or hardware failrue -- which caused dirty write blocks not to get
flushed successfully.

We could theoretically improve our defenses against index corruption under
such conditions on most systems using fsync().  However, it's not an absolute
guarantee, performance would suffer, and it would be a pain to implement.

    http://linux.die.net/man/2/fsync

Marvin Humphrey

Reply via email to