2015-06-28 Nicolas Barbier <[email protected]>:

> It seems that org.h2.store.FileStore.sync() is only called when a
> manual CHECKPOINT SYNC is issued (and also by
> PageStore.writeVariableHeader, which doesn’t seem to be relevant
> here). There are other callers of FileDisk.force(), but they don’t
> seem relevant to the case of “fsync after writing the log”.
>
> The WriterThread, which seems to perform the time-based flushing of
> the log, only calls PageStore.flushLog(), which calls PageLog.flush(),
> etc, which in the end doesn't seem to do any fsync’ing.

This analysis seems consistent with the fact that we often get corrupt
databases after a forced computer reset (I assume that the OS
typically reorders writes rather aggressively). It does not really
explain why our databases also become corrupt after an OOM situation
(so I guess that is a separate issue).

I guess it would be good to perform some strace-testing on a trivial
testcase, to check whether H2 indeed behaves as I expect it to (i.e.,
that it omits calls to f[data]sync when I think it should). OTOH,
having you confirm or reject my proposed solution would be even better
:-).

Assuming for a second that my analysis above is correct, I would
suggest letting PageStore.flushLog() just call sync() (i.e., replacing
the current 3 lines of code in it with a call to the sync() method
declared right underneath it). It seems that callers of flushLog seem
to expect that they can just start writing non-log stuff right after
calling flushLog, which seems consistent with the fact that flushLog
should perform the fsync’ing.

* It seems that flushLog() wouldn’t be called for each “operation” as
it is (mostly) only used when memory pressure kicks in, and therefore
the performance loss wouldn’t be dramatic.
* It would still be better for performance to move the log to a
separate file, so the fsync’ing could be restricted to the log,
instead of forcing all writes to the log + all writes to the main
storage area to hit the platter before returning.
* My proposed solution does not guarantee 100% invulnerability to
corruption in the face of drives that don’t honor fsync. I.e., if they
do reordering of write commands, even if told that there is supposed
to be an fsync in between. I don’t know whether the typical “buggy”
consumer drives do that, or whether they just return “it’s written” to
the OS immediately but keep the order of write commands correct
anyway; If the latter, corruption is avoided even with such drives
(although durability (which I don’t care about) is not).

Greetings,

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

Reply via email to