Andrew,
Jim Dunham wrote:
ZFS the filesystem is always on disk consistent, and ZFS does
maintain filesystem consistency through coordination between the
ZPL (ZFS POSIX Layer) and the ZIL (ZFS Intent Log). Unfortunately
for SNDR, ZFS caches a lot of an applications filesystem data in
the ZIL, therefore the data is in memory, not written to disk, so
SNDR does not know this data exists. ZIL flushes to disk can be
seconds behind the actual application writes completing, and if
SNDR is running asynchronously, these replicated writes to the SNDR
secondary can be additional seconds behind the actual application
writes.
Unlike UFS filesystems and lockfs -f, or lockfs -w, there is no
'supported' way to get ZFS to empty the ZIL to disk on demand.
I'm wondering if you really meant ZIL here, or ARC?
It is my understanding that the ZFS intent log (ZIL) satisfies POSIX
requirements for synchronous transactions, thus filesystem
consistency. The ZFS adaptive replacement cache (ARC) is where
uncommitted filesystem data is being cached. So although unwritten
filesystem data allocated from the DMU, retained in the ARC, it is the
ZIL which influences filesystem metadata and data consistency on disk.
In either case, creating a snapshot should get both flushed to disk,
I think?
No. A ZFS snapshot is a control path, verse data path operation and
(to the best of my understanding, and testing) has no influence over
POSIX filesystem consistency. See the discussion here: http://www.opensolaris.org/jive/click.jspa?searchID=1695691&messageID=124809
Invoking a ZFS snapshot will assure the ZFS snapshot is consistent on
the replicated disk, but not all actively opened files.
A simple test I performed to verify this, was to append to a ZFS file
(no synchronous filesystem options being set) a series of blocks with
a block order pattern contained within. At some random point in this
process, I took a ZFS snapshot, immediately dropped SNDR into logging
mode. When importing the ZFS storage pool on the SNDR remote host, I
could see the ZFS snapshot just taken, but neither the snapshot
version of the file, or the file itself contained all of the data
previously written to it.
I then retested, but opened the file with O_DSYNC, and when following
the same test steps above, both the snapshot version of the file, and
the file itself contained all of the data previously written to it.
(If you don't actually need a snapshot, simply destroy it
immediately afterwards.)
--
Andrew
Jim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss