> On Sept. 15, 2015, 4:24 a.m., Prakash Surya wrote: > > I've kicked off revision 3 this review to our internal ZFS regression > > suite; for those on Delphix's VPN the link is here: > > http://jenkins.delphix.com/job/zfs-precommit/2968/ > > > > I'll add another comment to this review once I get the results from that > > test run.
ztest failed with the following: ``` umem allocator: bufctl corrupted buffer=80f9b28 bufctl=0 cache: umem_alloc_16 umem: heap corruption detected stack trace: libumem.so.1'umem_err_recoverable+0x4a libumem.so.1'umem_error+0x492 libumem.so.1'umem_free+0xb7 libzpool.so.1'dsl_prop_fini+0x41 libzpool.so.1'dsl_dir_evict+0x101 libzpool.so.1'taskq_thread+0xd6 libc.so.1'_thrp_setup+0x88 libc.so.1'_lwp_start+0x0 child died with signal 6 ``` and a diff between the last illumos master zfs-test suite failures, and this review; it looks like some new failures were introduced: ``` > FAIL zvol/zvol_misc/zvol_misc_004_pos [no bug found] > FAIL online_offline/online_offline_002_neg [no bug found] > FAIL sparse/sparse_001_pos [no bug found] > FAIL nopwrite/nopwrite_recsize [no bug found] > FAIL nopwrite/nopwrite_sync [no bug found] > FAIL mmap/mmap_read_001_pos [no bug found] > FAIL nopwrite/nopwrite_varying_compression [no bug found] > FAIL nopwrite/nopwrite_volume [no bug found] > FAIL nopwrite/nopwrite_promoted_clone [no bug found] > FAIL nopwrite/nopwrite_mtime [no bug found] > FAIL mv_files/mv_files_002_pos [no bug found] > FAIL mv_files/mv_files_001_pos [no bug found] > FAIL online_offline/online_offline_001_pos [no bug found] > FAIL truncate/truncate_001_pos [no bug found] ``` let me know if I can copy core dumps, or test output somewhere to make debugging these failures possible. - Prakash ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.csiden.org/r/235/#review779 ----------------------------------------------------------- On Sept. 15, 2015, 1:56 a.m., Justin Gibbs wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.csiden.org/r/235/ > ----------------------------------------------------------- > > (Updated Sept. 15, 2015, 1:56 a.m.) > > > Review request for OpenZFS Developer Mailing List and Christopher Siden. > > > Bugs: 6171 > https://www.illumos.org/issues/6171 > > > Repository: illumos-gate > > > Description > ------- > > A change to a property on a dataset must be propagated to its descendants > in case that property is inherited. For datasets whose information is > not currently loaded into memory (e.g. a snapshot that isn't currently > mounted), there is nothing to do; the property change will take effect > the next time that dataset is loaded. To handle updates to datasets that > are in-core, ZFS registers a callback entry for each property of each > loaded dataset with the dsl directory that holds that dataset. There > is a dsl directory associated with each live dataset that references > both the live dataset and any snapshots of the live dataset. A property > change is effected by doing a traversal of the tree of dsl directories > for a pool, starting at the directory sourcing the change, and invoking > these callbacks. > > The current implementation both registers and de-registers properties > individually for each loaded dataset. While registration for a property is > O(1) (insert into a list), de-registration is O(n) (search list and then > remove). The 'n' for de-registration, however, is not limited to the size > (number of snapshots + 1) of the dsl directory. The eviction portion > of the life cycle for the in core state of datasets is asynchronous, > which allows multiple copies of the dataset information to be in-core > at once. Only one of these copies is active at any time with the rest > going through tear down processing, but all copies contribute to the > cost of performing a dsl_prop_unregister(). > > One way to create multiple, in-flight copies of dataset information > is by performing "zfs list" operations from multiple threads > concurrently. In-core dataset information is loaded on demand and then > evicted when reference counts drops to zero. For datasets that are not > mounted, there is no persistent reference count to keep them resident. > So, a list operation will load them, compute the information required to > do the list operation, and then evict them. When performing this operation > from multiple threads it is possible that some of the in-core dataset > information will be reused, but also possible to lose the race and load > the dataset again, even while the same information is being torn down. > > Compounding the performance issue further is a change made for illumos > issue 5056 which made dataset eviction single threaded. In environments > using automation to manage ZFS datasets, it is now possible to create > enough of a backlog of dataset evictions to consume excessive amounts > of kernel memory and to bog down the system. > > The fix employed here is to make property de-registration O(1). With this > change in place, it is hoped that a single thread is more than sufficient > to handle eviction processing. If it isn't, the problem can be solved > by increasing the number of threads devoted to the eviction taskq. > > uts/common/fs/zfs/dsl_dataset.c > uts/common/fs/zfs/dsl_dir.c: > uts/common/fs/zfs/dsl_prop.c: > uts/common/fs/zfs/sys/dsl_dataset.h: > uts/common/fs/zfs/sys/dsl_dir.h: > uts/common/fs/zfs/sys/dsl_prop.h: > Associate dsl property callback records with both the > dsl directory and the dsl dataset that is registering the > callback. Both connections are protected by the dsl directory's > "dd_lock". > > When linking callbacks into a dsl directory, group them by > the property type. This helps reduce the space penalty for the > double association (the property name pointer is stored once > per dsl_dir instead of in each record) and reduces the number of > strcmp() calls required to do callback processing when updating > a single property. Property types are stored in a linked list > since currently ZFS registers a maximum of 10 property types > for each dataset. > > Note that the property buckets/records associated with a dsl > directory are created on demand, but only freed when the dsl > directory is freed. Given the static nature of property types > and their small number, there is no benefit to freeing the few > bytes of memory used to represent the property record earlier. > When a property record becomes empty, the dsl directory is either > going to become unreferenced a little later in this thread of > execution, or there is a high chance that another dataset is > going to be loaded that would recreate the bucket anyway. > > Replace multiple calls to dsl_prop_unregister() with a single > call to dsl_prop_unregister_all() (O(n^2) -> O(n)). There are two > software layers in ZFS that use the property mechanism: the DMU, > and the ZPL. The DMU registers all of its callbacks with the > objset_t pointer of the dataset as the callback argument. The > ZPL uses the zfsvfs_t pointer. In both cases, these software > layers want to unregister all of the callbacks they registered > on a dataset, regardless of the property name. This new API uses > the callback argument pointer as a "software layer identifier" > for filtering. The list of callbacks is traversed once with > items selected for deletion by a pointer comparision. > > uts/common/fs/zfs/dmu_objset.c: > uts/common/fs/zfs/zfs_vfsops.c: > Replace use of dsl_prop_unregister() with the new > dsl_prop_unregister_all() API. > > > Diffs > ----- > > usr/src/uts/common/fs/zfs/sys/dsl_prop.h > 5fe18d6a7c550b621e9df0ff8458b703c6f1bcf3 > usr/src/uts/common/fs/zfs/sys/dsl_dataset.h > 001bff566a888bd9ab1d9c0bc7a5243ae8ca4cb6 > usr/src/uts/common/fs/zfs/zfs_vfsops.c > 3559b9b7b32db88326f06faa42c83907ab1344e6 > usr/src/uts/common/fs/zfs/dsl_prop.c > 09eec21604f45aacecd6f87397954d50492a36e0 > usr/src/uts/common/fs/zfs/dsl_dir.c > 6a20ab3ca2adabddc81c6199374ed99e7ec920fb > usr/src/uts/common/fs/zfs/dsl_dataset.c > 7f63cd97bcde4ba4a7708c1311c89ed93c8a8cfa > usr/src/uts/common/fs/zfs/dmu_objset.c > 9ad18f0d11fbaa1edb14ac5f2adfd215cb4716b3 > usr/src/uts/common/fs/zfs/sys/dsl_dir.h > f50014d95a8109e6d715371eee0af641e92730ae > > Diff: https://reviews.csiden.org/r/235/diff/ > > > Testing > ------- > > Ztest on illumos. > > > Thanks, > > Justin Gibbs > >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
