Much of this depends on the file system. There are journaling
filesystems that will never be corrupted. They use similar techniques
as we are discussing for Lucene.
There are other ways of opening the file that control whether or not
the metadata (directory blocks, file length, etc.) is syncd as well
as the file data.
On Nov 4, 2007, at 10:23 AM, Michael McCandless wrote:
Well, by calling sync() on every file before closing it (the patch in
LUCENE-1044), we should achieve this, albeit with a possibly sizable
loss of indexing performance (I'm testing that now...).
Though, I still can't figure out how to sync a directory from Java.
In the meantime ... one simple way to be robust to machine/OS crashes
is to keep more than just the last commit point alive in the index.
You just have to create a deletion policy that keeps all commit points
younger than X amount of time (ther is an example of this in Lucene's
TestDeletionPolicy unit test).
This way if the machine crashes and segments_N is not usable you'd
still have segments_N-1 (and maybe segments_N-2, ..., if they are new
enough) to fall back to.
However, I'm not sure how large X would need to be, in practice, for
all write caches to be properly flushed. And, this will necessarily
use more disk space in your index.
Mike
"Mark Miller" <[EMAIL PROTECTED]> wrote:
Even if we cannot guarantee durability, it would be nice if we could
guarantee a consistent index. It sounds like the only problem in a
machine with a lying drive is that you could lose a number of
committed
transactions. I would much prefer that to a corrupted index. I can
always re-add what was lost much quicker than rebuilding a 5
million doc
archive. In either case, I have my choice between the two as long
as the
index is guaranteed to be corruption free.
robert engels wrote:
Usually you can configure the drives so that sync() ALWAYS syncs -
drive jumpers, driver setup, or other methods. Some drives that are
battery backed and such do not need it.
Without sync() truly being a sync you could never write a database
that was resilient.
It will exact a heavier toll on performance that you might think. In
order to do it properly, all filesystem metadata must be sync;d as
well. The biggest difference is that you lose the degree of
multi-processing that is inherent when sync'ing is disabled - as the
drive (or OS) does the physical write asynchronously while the
system
does other work - with sync() this is lost.
This is why in a db system, the only file that is sync'd is the log
file - all other files can be made "in sync" from the log file - and
this file is normally striped for optimum write performance. Some
systems have special "log file drives" (some even solid state, or
battery backed ram) to aid the performance.
On Nov 4, 2007, at 8:30 AM, Yonik Seeley wrote:
On 11/4/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
The problem is, on a hard shutdown (kill -9 or JVM/machine
crashes),
apparently future operations may have completed while some past
operations have not. For example, the new segments_N file was
successfully written while say the _X.fdx file of the just-flushed
segment was not successfully written, even though Lucene had
written &
closed _X.fdx before segments_N.
That should be impossible except for a machine crash. Kill -9 or a
JVM crash should have no effect on data already written.
But a sync option would be both simple and useful for people
trying to
take live snapshots of an index, or to protect against machine
crashes. This isn't an absolute 100% guarantee either (so don't
test
for it) - the drives often lie to the OS about data being flushed.
It's the best we can do at our level though.
http://www.google.com/search?q=fsync+drive+lies
-Yonik
-------------------------------------------------------------------
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]