-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.csiden.org/r/163/
-----------------------------------------------------------
(Updated Feb. 23, 2015, 5:13 a.m.)
Review request for OpenZFS Developer Mailing List and Christopher Siden.
Changes
-------
Pull common bonus buffer eviction code from dnode_evict_dbufs() and dbuf_rele()
into dnode_evict_bonus().
Bugs: 5630
https://www.illumos.org/projects/illumos-gate//issues/5630
Repository: illumos-gate
Description (updated)
-------
Fix data corruption in ZFS triggered by rapidly destroying
and creating datasets while listing datasets in another
context.
dnode_sync_free() resets the in-core dnode_t of a just
deleted object so it represents an unallocated object. The
dnode_t will only be considered to represent a free dnode
after its last hold is released. This last hold may be
dropped after dnode_sync_free() returns. For data dbufs
holding the dnode, this delay can be due to asynchronous
evictions from the arc coincident with the dnode being
destroyed. For bonus buffers, this can happen if the object
type can be destroyed even when held in another context.
The observed data corruption occurred when a dsl_dir was
destroyed for a recursive dataset destroy, while a "zfs
list" operation also held this dsl_dir. Although
dnode_sync_free() calls dnode_evict_dbufs(), the hold on
the dsl_dir's bonus buffer prevented it from being evicted.
This left the bonus buffer associated with the dnode_t even
after the last hold on the dnode was released.
Some time later, this dnode_t was reused for a new dsl_dir.
Instead of getting a new (and zero filled) bonus buffer,
the contents from the old dsl_dir were returned. The dsl_dir
initialization code assumes the bonus buffer is zero filled
and so only explicitly initializes fields that have non-zero
initial values. This allowed the stale data to be incorporated
into the new dsl_dir and written to the media.
Bonus buffers are only evicted via dnode_evict_dbufs() when
there are no holds on the bonus buffer, and via
dnode_buf_pageout/dnode_destroy(). The fix employed here
evicts the bonus buffer for a freed dnode when the bonus
buffer's last hold is dropped, but before the bonus buffer's
dnode hold is released. This ensures the dnode_t can only
be reused after the bonus buffer eviction completes.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dnode.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:
Add dnode_rele_and_unlock(), which avoids an extra
lock drop and acquisition in contexts where the
dn_mtx is already held.
Extract bonus buffer eviction code from dnode_evict_dbufs()
into the new method dnode_evict_bonus().
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:
In dbuf_rele_and_unlock(), immediately evict bonus
buffers associated with freed dnodes. The data in
the bonus buffer is no longer relevant and this
prevents a stale bonus buffer from being associated
with the dnode_t should the dnode_t be reused prior
to being destroyed.
In dbuf_rele_and_unlock(), hold the bonus buffer's mutex
until after DB_DNODE_EXIT(). This prevents another
thread (e.g. dmu_objset_evict()->dnode_evict_dbufs()),
from evicting the bonus buffer and invalidating the
dbuf's dnh before or during the DB_DNODE_ENTER/EXIT()
calls.
Diffs (updated)
-----
usr/src/uts/common/fs/zfs/sys/dnode.h
406954a65323b8de8aa28b07073f588bc1805062
usr/src/uts/common/fs/zfs/dnode_sync.c
6a0f8cc3bfa2813e58f19270b45703f408b3c807
usr/src/uts/common/fs/zfs/dnode.c 4e376fc4d920fcc3f3429a359ed394952d116b42
usr/src/uts/common/fs/zfs/dbuf.c c735c9694ff905586a26d33d6ffd36cfa41ce4cb
Diff: https://reviews.csiden.org/r/163/diff/
Testing
-------
ZFS test suite on FreeBSD. ztest on illumos.
Continuous loop on FreeBSD of recursively receiving and destroying a tree of 18
file systems while running management code that responds to ZFS create/destroy
events and enumerates file systems. This test would fail within 1 hour prior
to this fix. It has run now for 2.5 days without failure.
Additional stress test:
Thread 1:
```
while true; do
zpool create foo da1
zpool export foo
zpool import foo
zpool destroy foo
done
```
Thread 2:
```
while true; do
zpool status foo >& /dev/null
done
```
Thanks,
Justin Gibbs
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer