Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Prakash Surya <[email protected]>
During the analysis of a hang we were able to discovery that during calls to
dsl_dataset_hold_obj we could call dsl_bookmark_init_ds which would allocate
the DSL's ds_bookmarks AVL tree.
If we have multiple threads trying to instantiate this dsl_dataset_t then only
one of them will win and the other will be destroyed as seen in this code path:
if (err != 0 || winner != NULL) {
bplist_destroy(&ds->ds_pending_deadlist);
dsl_deadlist_close(&ds->ds_deadlist);
if (dsl_deadlist_is_open(&ds->ds_remap_deadlist))
dsl_deadlist_close(&ds->ds_remap_deadlist);
if (ds->ds_prev)
dsl_dataset_rele(ds->ds_prev, ds);
dsl_dir_rele(ds->ds_dir, ds);
mutex_destroy(&ds->ds_lock);
mutex_destroy(&ds->ds_opening_lock);
mutex_destroy(&ds->ds_sendstream_lock);
refcount_destroy(&ds->ds_longholds);
kmem_free(ds, sizeof (dsl_dataset_t));
if (err != 0) {
dmu_buf_rele(dbuf, tag);
return (err);
}
ds = winner;
Unfortunately, this logic is not destroying the AVL tree that was created by
dsl_bookmark_init_ds.
>From the escalation we see two caches which show the leak:
cache buf buf buf memory alloc alloc
name size in use total in use succeed fail
------------------------- ------ ------ ------ ---------- --------- -----
kmem_alloc_64 64 882692309 882726488 54G 34581813 0
kmem_alloc_160 160 882937431 882941325 134G 785394457 0
The sizes equate to the following allocations from dsl_bookmark_node_alloc:
static dsl_bookmark_node_t *
dsl_bookmark_node_alloc(char *shortname) {
dsl_bookmark_node_t *dbn = kmem_alloc(sizeof (*dbn), KM_SLEEP);
dbn->dbn_name = spa_strdup(shortname);
Running a Dtrace script on the customer's site we were able to confirm the leak:
2 49231 dsl_deadlist_close:entry
zfs`dsl_dataset_hold_obj+0x35c
zfs`dsl_dataset_hold+0x78
zfs`dmu_objset_hold+0x5b
zfs`zfs_ioc_snapshot_list_next+0x27
zfs`zfsdev_ioctl+0x517
genunix`cdev_ioctl+0x39
specfs`spec_ioctl+0x60
genunix`fop_ioctl+0x55
genunix`ioctl+0x9b
unix`sys_syscall+0x177
FAILURE: ds mssql_db_container-18503/mssql_timeflow-18504/external, winner
fffffea3e9f6d2c0, bookmarks 6350
This shows the calling stack and the fact that winner is set means that another
thread won the race and we leaked 6350 bookmarks.
There are several key elements that are required to hit issue:
- we must be using zvols
- we must have ZFS bookmarks
- the dataset must not be exported via iSCSI
Upstream bug: DLPX-58193
You can view, comment on, or merge this pull request online at:
https://github.com/openzfs/openzfs/pull/674
-- Commit Summary --
* 9680 dsl_dataset_hold_obj can leak bookmarks
-- File Changes --
M usr/src/uts/common/fs/zfs/dsl_dataset.c (1)
-- Patch Links --
https://github.com/openzfs/openzfs/pull/674.patch
https://github.com/openzfs/openzfs/pull/674.diff
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/674
------------------------------------------
openzfs: openzfs-developer
Permalink:
https://openzfs.topicbox.com/groups/developer/T71aedf822e89207d-Me05d066325e02b6ffab985d8
Delivery options: https://openzfs.topicbox.com/groups/developer/subscription