Steven, good observation, and thanks for the quick turnaround on this fix. I think that we should hold the dsl_pool_config lock while calling dsl_dataset_name(), because otherwise the dd_parent could change (via dsl_dir_rename_sync()) and the old parent be evicted while we are looking at it. This would be an exceedingly tight race condition, probably impossible to exercise in practice. But we should consider adding an assertion to this effect (having dsl_dir_name assert that the lock is held), and making sure that all callers adhere to this convention. That is outside the scope of this specific fix, however.
I'd suggest that we hold the lock until after dsl_dataset_name() is called, using logic like the old code to ensure we drop the lock in the error case as well; too bad this code is a bit ugly. --matt On Sat, Nov 16, 2013 at 4:22 PM, Steven Hartland <[email protected]>wrote: > Hi all looking for reviewers for: > http://cr.illumos.org/~webrev/steveh/illumos-4322/ > > Which addresses: > https://www.illumos.org/issues/4322 > > I've tested to confirm it fixes the issue but I have > a concern that the dsl_pool_config lock should be > held for dsl_dataset_name dsl_dataset_rele. > > The reason for this is that both the code path > prior commit: a7a845e4bf22fd1b2a284729ccd95c7370a0438c > did this: > dsl_pool_config_enter(dp, FTAG); > error = dsl_dataset_hold_obj(dp, dsobj, FTAG, &ds); > if (error == 0) { > char name[MAXNAMELEN]; > dsl_dataset_name(ds, name); > dsl_dataset_rele(ds, FTAG); > dsl_pool_config_exit(dp, FTAG); > (void) zfs_unmount_snap(name); > } else { > dsl_pool_config_exit(dp, FTAG); > } > > As well as at least one other place I've found: > spa.c - spa_prop_get > dsl_pool_config_enter(dp, FTAG); > if (err = dsl_dataset_hold_obj(dp, > za.za_first_integer, FTAG, &ds)) { > dsl_pool_config_exit(dp, FTAG); > break; > } > > strval = kmem_alloc( > MAXNAMELEN + strlen(MOS_DIR_NAME) + 1, > KM_SLEEP); > dsl_dataset_name(ds, strval); > dsl_dataset_rele(ds, FTAG); > dsl_pool_config_exit(dp, FTAG); > > If the lock was only required for the obj hold I would > have expected both of the above to do so as it would be > cleaner. > > I have confirmed a kernel even with full asserts can > quite happily run without deadlock and no other asserts > are trigged during a send of a snapshot which is mounted, > so there are no obvious issues but always best to ask ;-) > > Regards > Steve > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. and > the person or entity to whom it is addressed. In the event of misdirection, > the recipient is prohibited from using, copying, printing or otherwise > disseminating it or any information contained in it. > In the event of misdirection, illegible or incomplete transmission please > telephone +44 845 868 1337 > or return the E.mail to [email protected]. > > > > ------------------------------------------- > illumos-zfs > Archives: https://www.listbox.com/member/archive/182191/=now > RSS Feed: https://www.listbox.com/member/archive/rss/182191/ > 21635000-ebd1d460 > Modify Your Subscription: https://www.listbox.com/ > member/?member_id=21635000&id_secret=21635000-73dc201a > Powered by Listbox: http://www.listbox.com >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
