Bah, that's quite right. Thanks for the step-by-step, I'm not sure how I missed it before.
Adam On Apr 14, 2010, at 11:04 AM, Robert Newson wrote: > I think Damien is right here. Consider this sequence; > > 1) update btree > 2) fsync > 3) write new header > 4) fsync > 5) more updates > 6) fsync > 7) write new header > 8) process terminates > > On open, the header at 7) might or might not be flushed all the way to > disk, but couchdb would update views to include changes made at 5). > Since the header at 7) isn't definitely fsync'ed, a second crash (say, > a kernel panic) could revert the .couch file itself to the state at > 4), but views are permanently wrong. It's hard to see it in practice > because the header is 4k and almost always gets to disk soon enough > anyway, especially if you do more i/o on the view indexes. > > B. > > On Wed, Apr 14, 2010 at 3:46 PM, Adam Kocoloski <[email protected]> wrote: >> Thanks Damien. I'm thinking that the situation you describe cannot occur if >> before_header is enabled in the fsync_options, since any data pointed to by >> the #db_header that the server found after the restart was already synced. >> Is that correct? >> >> Adam >> >> On Apr 14, 2010, at 10:26 AM, Damien Katz wrote: >> >>> The reason for fsync on open is the server doesn't know if the data it's >>> reading off the file is commited fully to the disk. It's possible the the >>> server wrote to file and crashed before fsync, then restarted. Then it >>> could refresh view indexes on the non-fsynced storage data, for example, >>> and crash again, losing data in the storage file, but not the updates to >>> the index file. Now the index is permanently out of date with the storage >>> file. But if you fsync on opening the storage file, that can't happen. >>> >>> -Damien >>> >>> >>> On Apr 14, 2010, at 5:52 AM, Adam Kocoloski wrote: >>> >>>> Initially posted on user@, but maybe it got lost in the noise. Does >>>> anyone know why we call fsync when we open a file? >>>> >>>> Adam >>>> >>>> Begin forwarded message: >>>> >>>>> From: Adam Kocoloski <[email protected]> >>>>> Date: April 11, 2010 10:44:03 PM EDT >>>>> To: [email protected] >>>>> Subject: optimal settings for [couchdb] fsync_options? >>>>> >>>>> Hi folks, I wanted to assemble some concrete information about the >>>>> purpose of each of the three fsync_options available in CouchDB and under >>>>> what conditions they should be enabled/disabled. These options are >>>>> >>>>> 1) before_header - calls file:sync(Fd) before writing a DB header to >>>>> disk. I believe the goal here is to prevent DB corruption by ensuring >>>>> that all the data referred to by the header is durably stored before the >>>>> header is written. A system that preserves write ordering could safely >>>>> disable this option. Does anyone know an example of such a system? >>>>> Perhaps a combination of a noop IO scheduler and a write-through or >>>>> nonvolatile disk cache? >>>>> >>>>> 2) after_header - calls file:sync(Fd) immediately after writing the DB >>>>> header. I think this one is done so that we don't lose too much data >>>>> following a CouchDB restart, and so that a client can ensure that stored >>>>> data will be retrievable after a restart by POSTing to >>>>> /db/_ensure_full_commit. It might make sense to disable this option if >>>>> e.g. you're relying on replication for durability. Although that's dicey >>>>> because the replicator calls ensure_full_commit for both DBs before >>>>> writing its own checkpoint record*, and by disabling the after_header >>>>> option you'd run the risk of skipping updates on the target in the face >>>>> of a power failure. >>>>> >>>>> 3) on_file_open - calls file:sync(Fd) immediately after opening a DB >>>>> file. I really don't know the purpose of this one. Anyone? >>>>> >>>>> Best, Adam >>>>> >>>>> * The reason the replicator calls ensure_full_commit on the source is to >>>>> detect situations where update_seqs might be reused. I wonder if we >>>>> could engineer a way around that ever happening, for example by ensuring >>>>> that on restart the update sequence jumps by a large number. But that's >>>>> a discussion for d...@. >>>> >>> >> >>
