Andrew,

Jim Dunham wrote:
ZFS the filesystem is always on disk consistent, and ZFS does maintain filesystem consistency through coordination between the ZPL (ZFS POSIX Layer) and the ZIL (ZFS Intent Log). Unfortunately for SNDR, ZFS caches a lot of an applications filesystem data in the ZIL, therefore the data is in memory, not written to disk, so SNDR does not know this data exists. ZIL flushes to disk can be seconds behind the actual application writes completing, and if SNDR is running asynchronously, these replicated writes to the SNDR secondary can be additional seconds behind the actual application writes.

Unlike UFS filesystems and lockfs -f, or lockfs -w, there is no 'supported' way to get ZFS to empty the ZIL to disk on demand.

I'm wondering if you really meant ZIL here, or ARC?

It is my understanding that the ZFS intent log (ZIL) satisfies POSIX requirements for synchronous transactions, thus filesystem consistency. The ZFS adaptive replacement cache (ARC) is where uncommitted filesystem data is being cached. So although unwritten filesystem data allocated from the DMU, retained in the ARC, it is the ZIL which influences filesystem metadata and data consistency on disk.

In either case, creating a snapshot should get both flushed to disk, I think?

No. A ZFS snapshot is a control path, verse data path operation and (to the best of my understanding, and testing) has no influence over POSIX filesystem consistency. See the discussion here: http://www.opensolaris.org/jive/click.jspa?searchID=1695691&messageID=124809

Invoking a ZFS snapshot will assure the ZFS snapshot is consistent on the replicated disk, but not all actively opened files.

A simple test I performed to verify this, was to append to a ZFS file (no synchronous filesystem options being set) a series of blocks with a block order pattern contained within. At some random point in this process, I took a ZFS snapshot, immediately dropped SNDR into logging mode. When importing the ZFS storage pool on the SNDR remote host, I could see the ZFS snapshot just taken, but neither the snapshot version of the file, or the file itself contained all of the data previously written to it.

I then retested, but opened the file with O_DSYNC, and when following the same test steps above, both the snapshot version of the file, and the file itself contained all of the data previously written to it.

(If you don't actually need a snapshot, simply destroy it immediately afterwards.)

--
Andrew

Jim

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to