Jeff Bonwick,

        Do you agree that their is a major tradeoff of
        "builds up a wad of transactions in memory"?

        We loose the changes if we have an unstable
        environment.

        Thus, I don't quite understand why a 2-phase
        approach to commits isn't done. First, take the
        transactions as they come and do a minimal amount
        of a delayed write. If the number of transactions
        build up, then convert to the delayed write scheme.

        This assumption is that not all ZFS envs are write
        heavy versus write once and read-many type accesses.
        My assumption is that attribute/meta reading
        outweighs all other accesses.
        
        Wouldn't this approach allow minimal outstanding
        transactions and favor read access. Yes, the assumption
        is that once the "wad" is started, the amount of writing
        could be substantial and thus the amount of available
        bandwidth for reading is reduced. This would then allow
        for a more N states to be available. Right?

        Second, their are a multiple uses  of "then: (then pushes,
        then flushes all disk..., then writes the new uberblock,
        then flushes the caches again), in which seems to have
        some level of possible parallelism which should reduce the
        latency from the start to the final write. Or did you just
        say that for simplicity sake?

        Mitchell Erblich
        -------------------
        

Jeff Bonwick wrote:
> 
> Toby Thain wrote:
> > I'm no guru, but would not ZFS already require strict ordering for its
> > transactions ... which property Peter was exploiting to get "fbarrier()"
> > for free?
> 
> Exactly.  Even if you disable the intent log, the transactional nature
> of ZFS ensures preservation of event ordering.  Note that disk caches
> don't come into it: ZFS builds up a wad of transactions in memory,
> then pushes them out as a transaction group.  That entire group will
> either commit or not.  ZFS writes all the new data to new locations,
> then flushes all disk write caches, then writes the new uberblock,
> then flushes the caches again.  Thus you can lose power at any point
> in the middle of committing transaction group N, and you're guaranteed
> that upon reboot, everything will either be at state N or state N-1.
> 
> I agree about the usefulness of fbarrier() vs. fsync(), BTW.  The cool
> thing is that on ZFS, fbarrier() is a no-op.  It's implicit after
> every system call.
> 
> Jeff
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to