A recent increase in email about ZFS and SNDR (the replication
component of Availability Suite), has given me reasons to post one of
my replies.
Well, now I'm confused! A collegue just pointed me towards your blog
entry about SNDR and ZFS which, until now, I thought was not a
supported configuration. So, could you confirm that for me one way
or the other?
ZFS is supported with SNDR, because SNDR is filesystem agnostic. That
said, ZFS is a very different beast then other Solaris filesystems.
The two golden rules of ZFS replication are:
1). All volumes in a ZFS storage pool (see output of zpool status),
must be placed in a single SNDR I/O consistency group. ZFS is the
first Solaris filesystem that validates consistency at all levels, so
all vdevs in a single storage pool must be replicated in a write-order
consistent manner, and I/O consistency groups is the means to
accomplish this.
2). While SNDR replication is active, do not attempt to zpool import
the SNDR secondary volumes, and while the ZFS storage pool is imported
on the SNDR secondary node, do not resume replication. This is truly a
double-edge sword, as the instance of ZFS running on the SNDR
secondary node, will see replicated writes from ZFS on the SNDR
primary node, consider these unknown CRCs as some form of data
corruption, and panic Solaris. This is the same reason two or more
Solaris hosts can't access the same ZFS storage pool in a SAN.
There is a slight safety net here, in that zpool import will think
that the ZFS storage pool is active on another node. Unfortunately
stopping replication does not change this state, so you will still
need to use the -f (force) option anyway, that is unless the zpool is
in the exported state on the SNDR primary node, as the exported state
will be replicated to the SNDR secondary node.
Of course I know that AVS only cares about blocks so, in principle,
the FS is irrelevant. However, last time I was researching this, I
found a doc that explained that the lack of support was due to the
unpredictable nature of zfs background processes (resilver, etc) and
therefore not being guaranteed of a truly quiesced FS.
ZFS the filesystem is always on disk consistent, and ZFS does maintain
filesystem consistency through coordination between the ZPL (ZFS POSIX
Layer) and the ZIL (ZFS Intent Log). Unfortunately for SNDR, ZFS
caches a lot of an applications filesystem data in the ZIL, therefore
the data is in memory, not written to disk, so SNDR does not know this
data exists. ZIL flushes to disk can be seconds behind the actual
application writes completing, and if SNDR is running asynchronously,
these replicated writes to the SNDR secondary can be additional
seconds behind the actual application writes.
Unlike UFS filesystems and lockfs -f, or lockfs -w, there is no
'supported' way to get ZFS to empty the ZIL to disk on demand. So even
though one will get both ZFS and application filesystem consistency
within the SNDR secondary volume, there can be many seconds worth of
lost data, since SNDR can't replicate what it does not see.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss