-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.csiden.org/r/131/
-----------------------------------------------------------

(Updated Dec. 16, 2014, 7:32 p.m.)


Review request for OpenZFS Developer Mailing List.


Changes
-------

Respond to review comment.


Bugs: 5056
    https://www.illumos.org/projects/illumos-gate//issues/5056


Repository: illumos-gate


Description
-------

Various improvements the dmu buf user API.

Submitted by:   Justin Gibbs <[email protected]>
Submitted by:   Will Andrews <[email protected]>
Sponsored by:   Spectra Logic Corporation

Collect dmu buf user API support data into a new structure,
dmu_buf_user_t.  Consumers of this interface must include a
dmu_buf_user_t as a member of the user data structure that will be
attached to a dmu buffer.  This reduces the size of dmu_buf_impl_t
by two pointers.

Queue dmu buf user eviction processing to a taskq.  This prevents
FreeBSD witness(4) lock-order reversal warnings, potential deadlocks,
and reduces stack depth.

Convert objset eviction from a synchronous to an asynchronous process
to accommodate the asynchronous invocation, via dmu buf user eviction,
of dnode_buf_pageout().

Modify existing users of the dmu buf user API to never access the dbuf to
which their data was attached after user eviction has occurred.  Accessing
the dbuf from the callback is no longer safe now that callbacks occur
without the locks that used to protect them.  Enforce this in ZFS_DEBUG
kernel builds by clearing the user's pointer to the dbuf, if any, at
the time of eviction.  Callbacks have also been modified to clear their
dbuf pointer so most errors are caught even on non ZFS_DEBUG kernels.
However, this will not catch accesses from other contexts that occur
between the time of eviction and the processing of the callback.

Clarify programmer intent and improve readability by providing
specialized functions for the common user data update actions "remove"
and "replace".

Provide code-comment documentation for each API call.

Perform runtime validation of proper API usage on ZFS_DEBUG kernels.

uts/common/fs/zfs/sys/dbuf.h:
uts/common/fs/zfs/dbuf.c:
        Add dbuf_verify_user() and call it from the dbuf user API and
        during dbuf eviction processing to verify dbuf user API state.

        Replace calls to dbuf_set_data(db, NULL) with more explicit
        db_clear_data().  dbuf_set_data() now asserts that its buffer
        argument is never NULL.

        Implement new dmu buf API functions.

        Add the dmu_evict_taskq for processing dmu buf user evictions.
        Add dmu_buf_user_evict_wait() which allows spa, dsl pool, and
        dmu close/fini functions to drain pending user evictions.

        In dbuf_rele_and_unlock(), immediately evict dbufs with a zero
        refcount for an objset that is being evicted.  This allows
        the indirect dbufs in a dnode to be evicted asynchronously
        after the zero refcount dbufs that reference them are cleared
        via dmu_objset_evict()->dnode_evict_dbufs().

uts/common/fs/zfs/sys/dmu_objset.h:
uts/common/fs/zfs/dmu_objset.c:
        End the practice of including special dnodes in os->os_dnodes.
        This allows os->os_dnodes to be managed completely by one
        eviction path: dnode_buf_pageout()->dnode_destroy().

        Split objset eviction processing into two pieces.  The first
        marks the objset as evicting, evicts any dbufs that have a
        refcount of zero, and then queues up the objset for the second
        phase of eviction.  Once os->os_dnodes has been cleared by
        dnode_buf_pageout()->dnode_destroy(), the second phase is executed.
        The second phase closes the special dnodes, dequeues the objset
        from the list of those undergoing eviction, and finally frees
        the objset.

        NOTE: Due to asynchronous eviction processing (invocation of
              dnode_buf_pageout()), it is possible for the meta dnode for the
              objset to have no holds even though os->os_dnodes is not empty.

uts/common/fs/zfs/sys/dnode.h:
uts/common/fs/zfs/dnode.c:
        Collapse the initialization of a dnode from dnode_hold_impl()
        into dnode_create().  Since we already grab os_lock, use it to
        provide mutual exclusion for dnh->dnh_dnode and to arbitrate
        the winner of the initial open race.  The only way to unset the
        handle is to page out an entire set of dnodes, which uses the
        user eviction mechanism to arbitrate.

        In dnode_destroy(), invoke final stage of objset eviction if
        the destroyed dnode is the last regular dnode in the objset.

        Modify dnode_buf_pageout() so that it doesn't reference the
        evicted dbuf.

uts/common/fs/zfs/dnode_sync.c:
        In dnode_evict_dbufs(), remove multiple passes over dn->dn_dbufs.
        This is possible now that objset eviction is asynchronously
        completed in a different context once dbuf eviction completes.

        In the case of objset eviction, any dbufs held by children will
        be evicted via dbuf_rele_and_unlock() once their refcounts go
        to zero.  Even when objset eviction is not active, the ordering
        of the avl tree guarantees that children will be released before
        parents, allowing the parent's refcounts to naturally drop to
        zero before they are inspected in this single loop.

uts/common/fs/zfs/sys/spa.h:
uts/common/fs/zfs/sys/spa_impl.h:
uts/common/fs/zfs/spa.c:
uts/common/fs/zfs/spa_misc.c:
        Track evicting objsets in the spa.

        When recording or validating the min spa reference count, wait
        for objset eviction processing to complete so that the reference
        count is stable.

uts/common/fs/zfs/sys/spa.h:
uts/common/fs/zfs/sys/dsl_dir.h:
uts/common/fs/zfs/spa_misc.c:
uts/common/fs/zfs/dsl_dir.c:
        Add spa_async_close() and dsl_dir_async_rele(). These APIs are
        used during the async eviction process of dsl datasets and dirs to
        indicate that the normal rules for releasing a reference count on
        the spa do not apply.  Async releases occur from a taskq without
        the namespace lock held and may be for objects contributing to
        spa_minref (e.g. during pool export).  Thus these APIs do not
        enforce the namespace lock being held or the spa refcount being
        greater than spa_minref on entry.

uts/common/fs/zfs/dsl_deadlist.c: uts/common/fs/zfs/dsl_dataset.c:
        Modify dsl_dataset_evict() so that it doesn't reference the
        evicted dbuf to determine if the deadlist needs to be closed.
        This is achieved by checking ds->ds_deadlist.dl_os instead
        which is now properly cleared when a deadlist is closed prior
        to eviction.

uts/common/fs/zfs/dsl_pool.c:
        In dsl_pool_close(), flush the user evictions for any just
        released dsl dirs with a call to dmu_buf_user_evict_wait().
        The dsl dirs have back references to the dsl_pool_t which are
        accessed during eviction so these must complete before the
        dsl_pool_t is destroyed.

uts/common/fs/zfs/dsl_dir.c: uts/common/fs/zfs/sys/dsl_dir.h:
uts/common/fs/zfs/dsl_dataset.c: uts/common/fs/zfs/sys/dsl_dataset.h:
uts/common/fs/zfs/dsl_dir.c: uts/common/fs/zfs/dsl_prop.c:
uts/common/fs/zfs/sa.c: uts/common/fs/zfs/sys/sa_impl.h:
uts/common/fs/zfs/zap.c: uts/common/fs/zfs/sys/zap_impl.h:
uts/common/fs/zfs/sys/zap_leaf.h: uts/common/fs/zfs/zap_micro.c:
        Conform to new dbuf user API.

uts/common/fs/zfs/sys/dsl_dataset.h:
uts/common/fs/zfs/dmu_objset.c:
uts/common/fs/zfs/dmu_send.c:
uts/common/fs/zfs/dmu_traverse.c
uts/common/fs/zfs/dsl_bookmark.c:
uts/common/fs/zfs/dsl_dataset.c:
uts/common/fs/zfs/dsl_deleg.c:
uts/common/fs/zfs/dsl_destroy.c:
uts/common/fs/zfs/dsl_prop.c:
uts/common/fs/zfs/dsl_scan.c:
uts/common/fs/zfs/dsl_userhold.c:
uts/common/fs/zfs/zil.c:
        Record whether or not a dsl dataset is a snapshot in the upon its
        creation in the ds_is_snapshot field rather than rely on access to
        the ds_num_children in the dsl_dataset_phys_t.  This ensures that
        the snapshot trait can be determined during eviction processing
        when the dbuf holding the dsl_dataset_phys_t is no longer
        available.  The conditional snapshot logic in dmu_objset_evict()
        is currently the only place where a ds_is_snapshot is tested
        during eviction.

uts/common/fs/zfs/sys/dmu.h:
        Add prototypes, data structures, and code comment documentation
        for the dmu buf user api.

        Pull in ASSERT() via zfs_context.h.

uts/common/fs/zfs/zfs_sa.c:
        Use zfs_context.h instead of manual includes of sys/types.h
        and sys/params.h so this file is compatible with the use of
        zfs_context.h in sys/dmu.h.

cmd/truss/Makefile.com: cmd/zhack/Makefile.com:
cmd/zinject/Makefile.com: lib/brand/solaris10/s10_brand/Makefile.com:
lib/libzfs/Makefile.com:
        Disable E_STATIC_UNUSED since lint complains about unused inline
        functions in header files, even though they are "inline", not
        "static inline".


Diffs (updated)
-----

  usr/src/uts/common/fs/zfs/dnode.c 250dd9f9415ed1d670de2aef983d022002fa169a 
  usr/src/uts/sparc/stmf_sbd/Makefile a57ccaf73d2a002d96232e0a84e79e1b391e9e6b 
  usr/src/uts/intel/stmf_sbd/Makefile a57ccaf73d2a002d96232e0a84e79e1b391e9e6b 
  usr/src/uts/intel/dev/Makefile b5c7c1a9c8defb5eaa1e09bf03a8e5a877369477 
  usr/src/uts/common/fs/zfs/zil.c 6beacadf710d650c9f1776f04ed3b1c6e27f4d6e 
  usr/src/uts/common/fs/zfs/zfs_sa.c ed5f27647510d51ec35b3e1bd213dec892d2d6dd 
  usr/src/uts/common/fs/zfs/zap_micro.c 
81ea4f730f97251ec54aefa91ae9de70e356ca8a 
  usr/src/uts/common/fs/zfs/zap.c de2d4deb00ebe665939bbbb64088c9eedacaac68 
  usr/src/uts/common/fs/zfs/sys/zap_leaf.h 
cd8b74a77a8c62098c701aac01c6bedd57f20a6b 
  usr/src/uts/common/fs/zfs/sys/zap_impl.h 
0d4fc69815fcadf7323206aad69383a6ef9e8662 
  usr/src/uts/common/fs/zfs/sys/spa_impl.h 
48b28eb5a13964b784174aa4a7c08f964185691d 
  usr/src/uts/common/fs/zfs/sys/spa.h cf307a2ec977642d2cb79963ed3abc0039e4f07b 
  usr/src/uts/common/fs/zfs/sys/sa_impl.h 
6b9af2ef4f89760047d1f8c67bf5e641b6f08de7 
  usr/src/uts/common/fs/zfs/sys/sa.h bc89fa07d222c0007c45e390a8ae81a3e1708a8f 
  usr/src/uts/common/fs/zfs/sys/dsl_dir.h 
772bfbe6db1267472bcb1d4ad20bc959f05b665d 
  usr/src/uts/common/fs/zfs/sys/dsl_dataset.h 
3160a05a8c59a9d504dd2e34b78296a377975a0a 
  usr/src/uts/common/fs/zfs/sys/dnode.h 
5668af1fef08f5785173af7d3ee0bb41d27d1e12 
  usr/src/uts/common/fs/zfs/sys/dmu_objset.h 
804f0c182b6f193590a822513f532e9cc5d400c0 
  usr/src/uts/common/fs/zfs/sys/dmu.h 24473daa6d2b5f9b5af2b4857d4f6e4b6d1abc7f 
  usr/src/uts/common/fs/zfs/sys/dbuf.h 8be8ed6bc85bf1a30673d3755c40f6b02a5c11b2 
  usr/src/uts/common/fs/zfs/spa_misc.c 1729ba0503be512ca2b2a22165abf34b25fc0f0b 
  usr/src/uts/common/fs/zfs/spa.c d0408a58023cc33ae362ab0027619b4f70a405f8 
  usr/src/uts/common/fs/zfs/sa.c 27d8513541a45e158321fd2ef3c54de9a02ac5ee 
  usr/src/uts/common/fs/zfs/dsl_userhold.c 
0985ec91fd50c8a1234973408d63e3f21631e597 
  usr/src/uts/common/fs/zfs/dsl_scan.c 5c9e68fb2ee68210c2450b1b5380f672a715f677 
  usr/src/uts/common/fs/zfs/dsl_prop.c 4e93e43f96983f6e3fd6fed1f3bc49a4ed55b96d 
  usr/src/uts/common/fs/zfs/dsl_pool.c 39a797c27958b924d41309d7c671ba48fc5d848d 
  usr/src/uts/common/fs/zfs/dsl_dir.c f49584168344629463041e177d3202fa6d4e396b 
  usr/src/uts/common/fs/zfs/dsl_destroy.c 
a776e144f1d2b02224d03c7e3fcfd14fd59d4ab9 
  usr/src/uts/common/fs/zfs/dsl_deleg.c 
dc405ebe58b522eb3612aaae4585efb8fa269ba2 
  usr/src/uts/common/fs/zfs/dsl_deadlist.c 
4ac562bfdba32382d2d1d807293640c9c1430657 
  usr/src/uts/common/fs/zfs/dsl_dataset.c 
5baf5c3c0a11db1deb546fa3b365f6faa1bfd4ca 
  usr/src/uts/common/fs/zfs/dsl_bookmark.c 
5fb7f96600dc6f4e823b703a20115e1340c80bf7 
  usr/src/uts/common/fs/zfs/dnode_sync.c 
63bfc94f9aee862354726fbbc2ede0bec3b6b156 
  usr/src/uts/common/fs/zfs/dmu_traverse.c 
bffdff65d6272249c26e020fad4a630a6bd3fc26 
  usr/src/uts/common/fs/zfs/dmu_send.c dc9f81cd2f28ef316b2425396393e14410be4f60 
  usr/src/uts/common/fs/zfs/dmu_objset.c 
e39264cc2f9324fb17746876b80d1454f1920b7d 
  usr/src/uts/common/fs/zfs/dbuf.c bcf66bc53d7b5b3320f78679a9a631e16adb8c10 
  usr/src/lib/libzfs/Makefile.com e7d33f3a028c56a6ba6e7e53a3aca51974a33505 
  usr/src/lib/brand/solaris10/s10_brand/Makefile.com 
022b31b4e7d2671eabdb7326d6c4f1a7c226a5f5 
  usr/src/cmd/zinject/Makefile.com 76d297937f13b863a7d1b75add9e7f18f8761c2d 
  usr/src/cmd/zhack/Makefile.com 67927083c4311afec5bc1ab4b18958618b41b5f5 
  usr/src/cmd/truss/Makefile.com b50028339965450aee9a5fb5f77a2fc972f2e755 

Diff: https://reviews.csiden.org/r/131/diff/


Testing
-------

ztest and zfs test suite on FreeBSD and illumos.


Thanks,

Justin Gibbs

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to