2015-06-28 Nicolas Barbier <[email protected]>: > It seems that org.h2.store.FileStore.sync() is only called when a > manual CHECKPOINT SYNC is issued (and also by > PageStore.writeVariableHeader, which doesn’t seem to be relevant > here). There are other callers of FileDisk.force(), but they don’t > seem relevant to the case of “fsync after writing the log”. > > The WriterThread, which seems to perform the time-based flushing of > the log, only calls PageStore.flushLog(), which calls PageLog.flush(), > etc, which in the end doesn't seem to do any fsync’ing.
This analysis seems consistent with the fact that we often get corrupt databases after a forced computer reset (I assume that the OS typically reorders writes rather aggressively). It does not really explain why our databases also become corrupt after an OOM situation (so I guess that is a separate issue). I guess it would be good to perform some strace-testing on a trivial testcase, to check whether H2 indeed behaves as I expect it to (i.e., that it omits calls to f[data]sync when I think it should). OTOH, having you confirm or reject my proposed solution would be even better :-). Assuming for a second that my analysis above is correct, I would suggest letting PageStore.flushLog() just call sync() (i.e., replacing the current 3 lines of code in it with a call to the sync() method declared right underneath it). It seems that callers of flushLog seem to expect that they can just start writing non-log stuff right after calling flushLog, which seems consistent with the fact that flushLog should perform the fsync’ing. * It seems that flushLog() wouldn’t be called for each “operation” as it is (mostly) only used when memory pressure kicks in, and therefore the performance loss wouldn’t be dramatic. * It would still be better for performance to move the log to a separate file, so the fsync’ing could be restricted to the log, instead of forcing all writes to the log + all writes to the main storage area to hit the platter before returning. * My proposed solution does not guarantee 100% invulnerability to corruption in the face of drives that don’t honor fsync. I.e., if they do reordering of write commands, even if told that there is supposed to be an fsync in between. I don’t know whether the typical “buggy” consumer drives do that, or whether they just return “it’s written” to the OS immediately but keep the order of write commands correct anyway; If the latter, corruption is avoided even with such drives (although durability (which I don’t care about) is not). Greetings, Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- You received this message because you are subscribed to the Google Groups "H2 Database" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout.
