Jeff Bonwick, Do you agree that their is a major tradeoff of "builds up a wad of transactions in memory"?
We loose the changes if we have an unstable environment. Thus, I don't quite understand why a 2-phase approach to commits isn't done. First, take the transactions as they come and do a minimal amount of a delayed write. If the number of transactions build up, then convert to the delayed write scheme. This assumption is that not all ZFS envs are write heavy versus write once and read-many type accesses. My assumption is that attribute/meta reading outweighs all other accesses. Wouldn't this approach allow minimal outstanding transactions and favor read access. Yes, the assumption is that once the "wad" is started, the amount of writing could be substantial and thus the amount of available bandwidth for reading is reduced. This would then allow for a more N states to be available. Right? Second, their are a multiple uses of "then: (then pushes, then flushes all disk..., then writes the new uberblock, then flushes the caches again), in which seems to have some level of possible parallelism which should reduce the latency from the start to the final write. Or did you just say that for simplicity sake? Mitchell Erblich ------------------- Jeff Bonwick wrote: > > Toby Thain wrote: > > I'm no guru, but would not ZFS already require strict ordering for its > > transactions ... which property Peter was exploiting to get "fbarrier()" > > for free? > > Exactly. Even if you disable the intent log, the transactional nature > of ZFS ensures preservation of event ordering. Note that disk caches > don't come into it: ZFS builds up a wad of transactions in memory, > then pushes them out as a transaction group. That entire group will > either commit or not. ZFS writes all the new data to new locations, > then flushes all disk write caches, then writes the new uberblock, > then flushes the caches again. Thus you can lose power at any point > in the middle of committing transaction group N, and you're guaranteed > that upon reboot, everything will either be at state N or state N-1. > > I agree about the usefulness of fbarrier() vs. fsync(), BTW. The cool > thing is that on ZFS, fbarrier() is a no-op. It's implicit after > every system call. > > Jeff > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss