> On Sept. 15, 2015, 4:24 a.m., Prakash Surya wrote:
> > I've kicked off revision 3 this review to our internal ZFS regression 
> > suite; for those on Delphix's VPN the link is here: 
> > http://jenkins.delphix.com/job/zfs-precommit/2968/
> > 
> > I'll add another comment to this review once I get the results from that 
> > test run.

ztest failed with the following:
```
umem allocator: bufctl corrupted
buffer=80f9b28  bufctl=0  cache: umem_alloc_16
umem: heap corruption detected
stack trace:
libumem.so.1'umem_err_recoverable+0x4a
libumem.so.1'umem_error+0x492
libumem.so.1'umem_free+0xb7
libzpool.so.1'dsl_prop_fini+0x41
libzpool.so.1'dsl_dir_evict+0x101
libzpool.so.1'taskq_thread+0xd6
libc.so.1'_thrp_setup+0x88
libc.so.1'_lwp_start+0x0
child died with signal 6
```

and a diff between the last illumos master zfs-test suite failures, and this 
review; it looks like some new failures were introduced:
```
>     FAIL zvol/zvol_misc/zvol_misc_004_pos [no bug found]
>     FAIL online_offline/online_offline_002_neg [no bug found]
>     FAIL sparse/sparse_001_pos [no bug found]
>     FAIL nopwrite/nopwrite_recsize [no bug found]
>     FAIL nopwrite/nopwrite_sync [no bug found]
>     FAIL mmap/mmap_read_001_pos [no bug found]
>     FAIL nopwrite/nopwrite_varying_compression [no bug found]
>     FAIL nopwrite/nopwrite_volume [no bug found]
>     FAIL nopwrite/nopwrite_promoted_clone [no bug found]
>     FAIL nopwrite/nopwrite_mtime [no bug found]
>     FAIL mv_files/mv_files_002_pos [no bug found]
>     FAIL mv_files/mv_files_001_pos [no bug found]
>     FAIL online_offline/online_offline_001_pos [no bug found]
>     FAIL truncate/truncate_001_pos [no bug found]
```

let me know if I can copy core dumps, or test output somewhere to make 
debugging these failures possible.


- Prakash


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.csiden.org/r/235/#review779
-----------------------------------------------------------


On Sept. 15, 2015, 1:56 a.m., Justin Gibbs wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.csiden.org/r/235/
> -----------------------------------------------------------
> 
> (Updated Sept. 15, 2015, 1:56 a.m.)
> 
> 
> Review request for OpenZFS Developer Mailing List and Christopher Siden.
> 
> 
> Bugs: 6171
>     https://www.illumos.org/issues/6171
> 
> 
> Repository: illumos-gate
> 
> 
> Description
> -------
> 
> A change to a property on a dataset must be propagated to its descendants
> in case that property is inherited. For datasets whose information is
> not currently loaded into memory (e.g. a snapshot that isn't currently
> mounted), there is nothing to do; the property change will take effect
> the next time that dataset is loaded. To handle updates to datasets that
> are in-core, ZFS registers a callback entry for each property of each
> loaded dataset with the dsl directory that holds that dataset. There
> is a dsl directory associated with each live dataset that references
> both the live dataset and any snapshots of the live dataset. A property
> change is effected by doing a traversal of the tree of dsl directories
> for a pool, starting at the directory sourcing the change, and invoking
> these callbacks.
> 
> The current implementation both registers and de-registers properties
> individually for each loaded dataset. While registration for a property is
> O(1) (insert into a list), de-registration is O(n) (search list and then
> remove). The 'n' for de-registration, however, is not limited to the size
> (number of snapshots + 1) of the dsl directory. The eviction portion
> of the life cycle for the in core state of datasets is asynchronous,
> which allows multiple copies of the dataset information to be in-core
> at once. Only one of these copies is active at any time with the rest
> going through tear down processing, but all copies contribute to the
> cost of performing a dsl_prop_unregister().
> 
> One way to create multiple, in-flight copies of dataset information
> is by performing "zfs list" operations from multiple threads
> concurrently. In-core dataset information is loaded on demand and then
> evicted when reference counts drops to zero. For datasets that are not
> mounted, there is no persistent reference count to keep them resident.
> So, a list operation will load them, compute the information required to
> do the list operation, and then evict them. When performing this operation
> from multiple threads it is possible that some of the in-core dataset
> information will be reused, but also possible to lose the race and load
> the dataset again, even while the same information is being torn down.
> 
> Compounding the performance issue further is a change made for illumos
> issue 5056 which made dataset eviction single threaded.  In environments
> using automation to manage ZFS datasets, it is now possible to create
> enough of a backlog of dataset evictions to consume excessive amounts
> of kernel memory and to bog down the system.
> 
> The fix employed here is to make property de-registration O(1).  With this
> change in place, it is hoped that a single thread is more than sufficient
> to handle eviction processing. If it isn't, the problem can be solved
> by increasing the number of threads devoted to the eviction taskq.
> 
> uts/common/fs/zfs/dsl_dataset.c
> uts/common/fs/zfs/dsl_dir.c:
> uts/common/fs/zfs/dsl_prop.c:
> uts/common/fs/zfs/sys/dsl_dataset.h:
> uts/common/fs/zfs/sys/dsl_dir.h:
> uts/common/fs/zfs/sys/dsl_prop.h:
>       Associate dsl property callback records with both the
>       dsl directory and the dsl dataset that is registering the
>       callback. Both connections are protected by the dsl directory's
>       "dd_lock".
> 
>       When linking callbacks into a dsl directory, group them by
>       the property type. This helps reduce the space penalty for the
>       double association (the property name pointer is stored once
>       per dsl_dir instead of in each record) and reduces the number of
>       strcmp() calls required to do callback processing when updating
>       a single property. Property types are stored in a linked list
>       since currently ZFS registers a maximum of 10 property types
>       for each dataset.
> 
>       Note that the property buckets/records associated with a dsl
>       directory are created on demand, but only freed when the dsl
>       directory is freed. Given the static nature of property types
>       and their small number, there is no benefit to freeing the few
>       bytes of memory used to represent the property record earlier.
>       When a property record becomes empty, the dsl directory is either
>       going to become unreferenced a little later in this thread of
>       execution, or there is a high chance that another dataset is
>       going to be loaded that would recreate the bucket anyway.
> 
>       Replace multiple calls to dsl_prop_unregister() with a single
>       call to dsl_prop_unregister_all() (O(n^2) -> O(n)).  There are two
>       software layers in ZFS that use the property mechanism: the DMU,
>       and the ZPL. The DMU registers all of its callbacks with the
>       objset_t pointer of the dataset as the callback argument. The
>       ZPL uses the zfsvfs_t pointer. In both cases, these software
>       layers want to unregister all of the callbacks they registered
>       on a dataset, regardless of the property name. This new API uses
>       the callback argument pointer as a "software layer identifier"
>       for filtering. The list of callbacks is traversed once with
>       items selected for deletion by a pointer comparision.
> 
> uts/common/fs/zfs/dmu_objset.c:
> uts/common/fs/zfs/zfs_vfsops.c:
>       Replace use of dsl_prop_unregister() with the new
>       dsl_prop_unregister_all() API.
> 
> 
> Diffs
> -----
> 
>   usr/src/uts/common/fs/zfs/sys/dsl_prop.h 
> 5fe18d6a7c550b621e9df0ff8458b703c6f1bcf3 
>   usr/src/uts/common/fs/zfs/sys/dsl_dataset.h 
> 001bff566a888bd9ab1d9c0bc7a5243ae8ca4cb6 
>   usr/src/uts/common/fs/zfs/zfs_vfsops.c 
> 3559b9b7b32db88326f06faa42c83907ab1344e6 
>   usr/src/uts/common/fs/zfs/dsl_prop.c 
> 09eec21604f45aacecd6f87397954d50492a36e0 
>   usr/src/uts/common/fs/zfs/dsl_dir.c 
> 6a20ab3ca2adabddc81c6199374ed99e7ec920fb 
>   usr/src/uts/common/fs/zfs/dsl_dataset.c 
> 7f63cd97bcde4ba4a7708c1311c89ed93c8a8cfa 
>   usr/src/uts/common/fs/zfs/dmu_objset.c 
> 9ad18f0d11fbaa1edb14ac5f2adfd215cb4716b3 
>   usr/src/uts/common/fs/zfs/sys/dsl_dir.h 
> f50014d95a8109e6d715371eee0af641e92730ae 
> 
> Diff: https://reviews.csiden.org/r/235/diff/
> 
> 
> Testing
> -------
> 
> Ztest on illumos.
> 
> 
> Thanks,
> 
> Justin Gibbs
> 
>

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to