Initially posted on user@, but maybe it got lost in the noise. Does anyone know why we call fsync when we open a file?
Adam Begin forwarded message: > From: Adam Kocoloski <[email protected]> > Date: April 11, 2010 10:44:03 PM EDT > To: [email protected] > Subject: optimal settings for [couchdb] fsync_options? > > Hi folks, I wanted to assemble some concrete information about the purpose of > each of the three fsync_options available in CouchDB and under what > conditions they should be enabled/disabled. These options are > > 1) before_header - calls file:sync(Fd) before writing a DB header to disk. I > believe the goal here is to prevent DB corruption by ensuring that all the > data referred to by the header is durably stored before the header is > written. A system that preserves write ordering could safely disable this > option. Does anyone know an example of such a system? Perhaps a combination > of a noop IO scheduler and a write-through or nonvolatile disk cache? > > 2) after_header - calls file:sync(Fd) immediately after writing the DB > header. I think this one is done so that we don't lose too much data > following a CouchDB restart, and so that a client can ensure that stored data > will be retrievable after a restart by POSTing to /db/_ensure_full_commit. > It might make sense to disable this option if e.g. you're relying on > replication for durability. Although that's dicey because the replicator > calls ensure_full_commit for both DBs before writing its own checkpoint > record*, and by disabling the after_header option you'd run the risk of > skipping updates on the target in the face of a power failure. > > 3) on_file_open - calls file:sync(Fd) immediately after opening a DB file. I > really don't know the purpose of this one. Anyone? > > Best, Adam > > * The reason the replicator calls ensure_full_commit on the source is to > detect situations where update_seqs might be reused. I wonder if we could > engineer a way around that ever happening, for example by ensuring that on > restart the update sequence jumps by a large number. But that's a discussion > for d...@.
