Cool, this is what I had in mind, and I think it will work well. Here are a few recommendations:
dmu_objset_own_obj() - make this share code with dmu_objset_own() (e.g. add a dmu_objset_own_impl() to do the shared logic) dmu_objset_find_parallel - assert that flags doesn't have FIND_SNAPSHOTS dmu_objset_find_parallel_impl - it might make sense to call the callback before doing taskq_dispatch, that way we can say that the traversal is pre-order (callback on parent is executed before callbacks on children). It would be really great to extend this to be a replacement for dmu_objset_find_dp(), perhaps with a flag for single-threaded callbacks (i.e. create the taskq with just one thread). I took a look at all the consumers of dmu_objset_find_dp(), and I think they would all work fine with pre-order iteration (as you're doing) rather than post-order. vdev_count_leaves_impl - can this be "static"? zil.c lines 639, 698: incorrect indentation (should be 2 tabs + 4 spaces) It's conventional to use %llu and then cast the argument to "unsigned long long", as opposed to using PRIu64. --matt On Tue, Oct 28, 2014 at 1:59 AM, Arne Jansen <[email protected]> wrote: > Like this? > > http://cr.illumos.org/~webrev/sensille/find_parallel_dp/ > > -Arne > > On 10/16/2014 10:12 PM, Matthew Ahrens via illumos-developer wrote: > > (resend to cc a few more lists -- sorry) > > > > I think the overall idea of this change is sound -- and that's a great > > performance improvement. > > > > Can you change dmu_objset_find_parallel() to work like > dmu_objset_find_dp(), and > > then change zil_check_log_chain(0 and zil_claim() to work off of the > > dsl_dataset_t*, rather than the char*name? Then I think the other > threads won't > > need to grab the namespace lock. > > > > --matt > > > > On Thu, Oct 16, 2014 at 12:05 PM, Arne Jansen via illumos-developer > > <[email protected] <mailto:[email protected]>> > wrote: > > > > In our setup zpool import takes a very long time (about 15 minutes) > due > > to a large count of filesystems (> 5000). > > During this time, you can see the disks at only a few percent busy in > > iostat, while the pool is at 100%, which indicates a single-threaded > > import process. > > What's causing this is checking of the zil chains for each > filesystem. > > Each filesystem is visisted twice, once to check the chain and once > > to claim it. This is done in a sequential manner. > > I've created a proof-of-concept patch that switches this process to a > > parallel enumeration using a taskq. > > In my setup this speeds up the import process by a factor of 20, > > bringing all disks (30 in my case) to nearly 100% busy. > > There's only one problem: locking. spa_open is called with > > spa_namespace_lock held. With this, later on zil_check_log_chain and > > zil_claim are called. These call in turn spa_open, which tries to > claim > > spa_namespace_lock again. This problem is not new. The current > solution > > looks like this: > > > > /* > > * As disgusting as this is, we need to support recursive calls to > this > > * function because dsl_dir_open() is called during spa_load(), and > ends > > * up calling spa_open() again. The real fix is to figure out how to > > * avoid dsl_dir_open() calling this in the first place. > > */ > > if (mutex_owner(&spa_namespace___lock) != curthread) { > > mutex_enter(&spa_namespace___lock); > > locked = B_TRUE; > > } > > > > This doesn't work anymore when I call zil_claim/check_log_chain > through > > a taskq and thus from a different thread. My current hacky solution > is > > to pass a flag through the call chain to spa_open to signal that the > > lock isn't needed. It works, but it doesn't look too nice... > > > > My question now is how to build a cleaner solution. The snippet above > > is already ugly in itself, so it would be good to get rid of it at > all. > > The functions involved in the call chains are all very commonly used > > functions. Changing the signature of those will probably give a quite > > bulky patch, so I'd prefer to find a nice small unintrusive way. > > > > The patch currently looks like this: > > > > http://cr.illumos.org/~webrev/__sensille/find_parallel/ > > <http://cr.illumos.org/~webrev/sensille/find_parallel/> > > > > Hopefully you can give me some hints on how to solve this recursive > > locking riddle... > > > > -Arne > > > > > > > > ------------------------------__------------- > > illumos-developer > > Archives: https://www.listbox.com/__member/archive/182179/=now > > <https://www.listbox.com/member/archive/182179/=now> > > RSS Feed: > > > https://www.listbox.com/__member/archive/rss/182179/__21175174-cd73734d > > <https://www.listbox.com/member/archive/rss/182179/21175174-cd73734d > > > > Modify Your Subscription: > > https://www.listbox.com/__member/?&id___secret=21175174-792643f6 > > <https://www.listbox.com/member/?&> > > Powered by Listbox: http://www.listbox.com > > > > > > *illumos-developer* | Archives > > <https://www.listbox.com/member/archive/182179/=now> > > <https://www.listbox.com/member/archive/rss/182179/21501168-10807f51> | > Modify > > < > https://www.listbox.com/member/?member_id=21501168&id_secret=21501168-469bb5e1 > > > > Your Subscription [Powered by Listbox] <http://www.listbox.com> > > > >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
