Re: Database integrity [Detailed info on the B-tree store? Native implementations thereof?]

Jens Alfke Tue, 11 Aug 2009 20:43:06 -0700


On Aug 11, 2009, at 5:03 PM, Damien Katz wrote:

The worst problem is that the disk controller will reorder sectorwrites to reduce seek time, which in effect means that if power islost, some random subset of the last writes may not happen. So youwon't just end up with a truncated file — you could have a filethat seems intact and has a correct header at the end, but has 4kbytes of garbage somewhere within the last transaction. DoesCouchDB's file structure guard against that?
First we fsync all the data and indexes, then we write and fsync theheaders in a separate step.

Cool. From my discussions with Apple filesystem guru DominicGiampaolo, I gather that this two-phase approach is the right way toguarantee consistency. (It's also used by the HFS+ filesystem tosecure its journal.)

The caveat is that the fsyncs have to be the paranoid kind that flushthe disk-controller cache, not just the OS kernel cache. (This is whatthe nonstandard F_FULLFSYNC mode does in Darwin/OS X; hopefullyCouchDB knows to use that when built for that platform.)


—Jens

Re: Database integrity [Detailed info on the B-tree store? Native implementations thereof?]

Reply via email to