Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Prakash Surya <[email protected]>

During the analysis of a hang we were able to discovery that during calls to 
dsl_dataset_hold_obj we could call dsl_bookmark_init_ds which would allocate 
the DSL's ds_bookmarks AVL tree.

If we have multiple threads trying to instantiate this dsl_dataset_t then only 
one of them will win and the other will be destroyed as seen in this code path:

if (err != 0 || winner != NULL) {
                        bplist_destroy(&ds->ds_pending_deadlist);
                        dsl_deadlist_close(&ds->ds_deadlist);
                        if (dsl_deadlist_is_open(&ds->ds_remap_deadlist))
                                dsl_deadlist_close(&ds->ds_remap_deadlist);
                        if (ds->ds_prev)
                                dsl_dataset_rele(ds->ds_prev, ds);
                        dsl_dir_rele(ds->ds_dir, ds);
                        mutex_destroy(&ds->ds_lock);
                        mutex_destroy(&ds->ds_opening_lock);
                        mutex_destroy(&ds->ds_sendstream_lock);
                        refcount_destroy(&ds->ds_longholds);
                        kmem_free(ds, sizeof (dsl_dataset_t));
                        if (err != 0) {
                                dmu_buf_rele(dbuf, tag);
                                return (err);
                        }
                        ds = winner;
Unfortunately, this logic is not destroying the AVL tree that was created by 
dsl_bookmark_init_ds.
>From the escalation we see two caches which show the leak:

cache buf buf buf memory alloc alloc
name size in use total in use succeed fail
------------------------- ------ ------ ------ ---------- --------- -----
kmem_alloc_64 64 882692309 882726488 54G 34581813 0

kmem_alloc_160 160 882937431 882941325 134G 785394457 0
The sizes equate to the following allocations from dsl_bookmark_node_alloc:

static dsl_bookmark_node_t *
dsl_bookmark_node_alloc(char *shortname) {
dsl_bookmark_node_t *dbn = kmem_alloc(sizeof (*dbn), KM_SLEEP);
dbn->dbn_name = spa_strdup(shortname);
Running a Dtrace script on the customer's site we were able to confirm the leak:

2  49231         dsl_deadlist_close:entry
              zfs`dsl_dataset_hold_obj+0x35c
              zfs`dsl_dataset_hold+0x78
              zfs`dmu_objset_hold+0x5b
              zfs`zfs_ioc_snapshot_list_next+0x27
              zfs`zfsdev_ioctl+0x517
              genunix`cdev_ioctl+0x39
              specfs`spec_ioctl+0x60
              genunix`fop_ioctl+0x55
              genunix`ioctl+0x9b
              unix`sys_syscall+0x177
FAILURE: ds mssql_db_container-18503/mssql_timeflow-18504/external, winner 
fffffea3e9f6d2c0, bookmarks 6350
This shows the calling stack and the fact that winner is set means that another 
thread won the race and we leaked 6350 bookmarks.
There are several key elements that are required to hit issue:
- we must be using zvols
- we must have ZFS bookmarks
- the dataset must not be exported via iSCSI

Upstream bug: DLPX-58193
You can view, comment on, or merge this pull request online at:

  https://github.com/openzfs/openzfs/pull/674

-- Commit Summary --

  * 9680 dsl_dataset_hold_obj can leak bookmarks

-- File Changes --

    M usr/src/uts/common/fs/zfs/dsl_dataset.c (1)

-- Patch Links --

https://github.com/openzfs/openzfs/pull/674.patch
https://github.com/openzfs/openzfs/pull/674.diff

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/674

------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/T71aedf822e89207d-Me05d066325e02b6ffab985d8
Delivery options: https://openzfs.topicbox.com/groups/developer/subscription

Reply via email to