-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.csiden.org/r/163/
-----------------------------------------------------------

(Updated Feb. 17, 2015, 6:20 a.m.)


Review request for OpenZFS Developer Mailing List and Christopher Siden.


Changes
-------

Fix lock order reversal (db_mtx before dn_mtx).  Hold db_mtx all the way to 
dbuf_evict() so the bonus buffer cannot be held, released, or evicted from 
another thread context.


Bugs: 5630
    https://www.illumos.org/projects/illumos-gate//issues/5630


Repository: illumos-gate


Description
-------

Fix data corruption in ZFS triggered by rapidly destroying
and creating datasets while listing datasets in another
context.

dnode_sync_free() resets the in-core dnode_t of a just
deleted object so it represents an unallocated object.  The
dnode_t will only be considered to represent a free dnode
after its last hold is released.  This last hold may be
dropped after dnode_sync_free() returns.  For data dbufs
holding the dnode, this delay can be due to asynchronous
evictions from the arc coincident with the dnode being
destroyed.  For bonus buffers, this can happen if the object
type can be destroyed even when held in another context.

The observed data corruption occurred when a dsl_dir was
destroyed for a recursive dataset destroy, while a "zfs
list" operation also held this dsl_dir.  Although
dnode_sync_free() calls dnode_evict_dbufs(), the hold on
the dsl_dir's bonus buffer prevented it from being evicted.
This left the bonus buffer associated with the dnode_t even
after the last hold on the dnode was released.

Some time later, this dnode_t was reused for a new dsl_dir.
Instead of getting a new (and zero filled) bonus buffer,
the contents from the old dsl_dir were returned.  The dsl_dir
initialization code assumes the bonus buffer is zero filled
and so only explicitly initializes fields that have non-zero
initial values.  This allowed the stale data to be incorporated
into the new dsl_dir and written to the media.

Bonus buffers are only evicted via dnode_evict_dbufs() when
there are no holds on the bonus buffer, and via
dnode_buf_pageout/dnode_destroy().  The fix employed here
evicts the bonus buffer for a freed dnode when the bonus
buffer's last hold is dropped, but before the bonus buffer's
dnode hold is released.  This ensures the dnode_t can only
be reused after the bonus buffer eviction completes.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dnode.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:
        Add dnode_rele_and_unlock(), which avoids an extra
        lock drop and acquisition in contexts where the
        dn_mtx is already held.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:
        In dbuf_rele_and_unlock(), immediately evict bonus
        buffers associated with freed dnodes.  The data in
        the bonus buffer is no longer relevant and this
        prevents a stale bonus buffer from being associated
        with the dnode_t should the dnode_t be reused prior
        to being destroyed.


Diffs (updated)
-----

  usr/src/uts/common/fs/zfs/dbuf.c c735c9694ff905586a26d33d6ffd36cfa41ce4cb 
  usr/src/uts/common/fs/zfs/sys/dnode.h 
406954a65323b8de8aa28b07073f588bc1805062 
  usr/src/uts/common/fs/zfs/dnode.c 4e376fc4d920fcc3f3429a359ed394952d116b42 

Diff: https://reviews.csiden.org/r/163/diff/


Testing
-------

ZFS test suite on FreeBSD.

Continuous loop on FreeBSD of recursively receiving and destroying a tree of 18 
file systems while running management code that responds to ZFS create/destroy 
events and enumerates file systems.  This test would fail within 1 hour prior 
to this fix.  It has run now for 2.5 days without failure.


Thanks,

Justin Gibbs

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to