Cool, this is what I had in mind, and I think it will work well.  Here are
a few recommendations:

dmu_objset_own_obj() - make this share code with dmu_objset_own() (e.g. add
a dmu_objset_own_impl() to do the shared logic)

dmu_objset_find_parallel - assert that flags doesn't have FIND_SNAPSHOTS
dmu_objset_find_parallel_impl - it might make sense to call the callback
before doing taskq_dispatch, that way we can say that the traversal is
pre-order (callback on parent is executed before callbacks on children).

It would be really great to extend this to be a replacement for
dmu_objset_find_dp(), perhaps with a flag for single-threaded callbacks
(i.e. create the taskq with just one thread).  I took a look at all the
consumers of dmu_objset_find_dp(), and I think they would all work fine
with pre-order iteration (as you're doing) rather than post-order.

vdev_count_leaves_impl - can this be "static"?

zil.c lines 639, 698: incorrect indentation (should be 2 tabs + 4 spaces)

It's conventional to use %llu and then cast the argument to "unsigned long
long", as opposed to using PRIu64.

--matt



On Tue, Oct 28, 2014 at 1:59 AM, Arne Jansen <[email protected]> wrote:

> Like this?
>
> http://cr.illumos.org/~webrev/sensille/find_parallel_dp/
>
> -Arne
>
> On 10/16/2014 10:12 PM, Matthew Ahrens via illumos-developer wrote:
> > (resend to cc a few more lists -- sorry)
> >
> > I think the overall idea of this change is sound -- and that's a great
> > performance improvement.
> >
> > Can you change dmu_objset_find_parallel() to work like
> dmu_objset_find_dp(), and
> > then change zil_check_log_chain(0 and zil_claim() to work off of the
> > dsl_dataset_t*, rather than the char*name?  Then I think the other
> threads won't
> > need to grab the namespace lock.
> >
> > --matt
> >
> > On Thu, Oct 16, 2014 at 12:05 PM, Arne Jansen via illumos-developer
> > <[email protected] <mailto:[email protected]>>
> wrote:
> >
> >     In our setup zpool import takes a very long time (about 15 minutes)
> due
> >     to a large count of filesystems (> 5000).
> >     During this time, you can see the disks at only a few percent busy in
> >     iostat, while the pool is at 100%, which indicates a single-threaded
> >     import process.
> >     What's causing this is checking of the zil chains for each
> filesystem.
> >     Each filesystem is visisted twice, once to check the chain and once
> >     to claim it. This is done in a sequential manner.
> >     I've created a proof-of-concept patch that switches this process to a
> >     parallel enumeration using a taskq.
> >     In my setup this speeds up the import process by a factor of 20,
> >     bringing all disks (30 in my case) to nearly 100% busy.
> >     There's only one problem: locking. spa_open is called with
> >     spa_namespace_lock held. With this, later on zil_check_log_chain and
> >     zil_claim are called. These call in turn spa_open, which tries to
> claim
> >     spa_namespace_lock again. This problem is not new. The current
> solution
> >     looks like this:
> >
> >     /*
> >      * As disgusting as this is, we need to support recursive calls to
> this
> >      * function because dsl_dir_open() is called during spa_load(), and
> ends
> >      * up calling spa_open() again.  The real fix is to figure out how to
> >      * avoid dsl_dir_open() calling this in the first place.
> >      */
> >     if (mutex_owner(&spa_namespace___lock) != curthread) {
> >             mutex_enter(&spa_namespace___lock);
> >             locked = B_TRUE;
> >     }
> >
> >     This doesn't work anymore when I call zil_claim/check_log_chain
> through
> >     a taskq and thus from a different thread. My current hacky solution
> is
> >     to pass a flag through the call chain to spa_open to signal that the
> >     lock isn't needed. It works, but it doesn't look too nice...
> >
> >     My question now is how to build a cleaner solution. The snippet above
> >     is already ugly in itself, so it would be good to get rid of it at
> all.
> >     The functions involved in the call chains are all very commonly used
> >     functions. Changing the signature of those will probably give a quite
> >     bulky patch, so I'd prefer to find a nice small unintrusive way.
> >
> >     The patch currently looks like this:
> >
> >     http://cr.illumos.org/~webrev/__sensille/find_parallel/
> >     <http://cr.illumos.org/~webrev/sensille/find_parallel/>
> >
> >     Hopefully you can give me some hints on how to solve this recursive
> >     locking riddle...
> >
> >     -Arne
> >
> >
> >
> >     ------------------------------__-------------
> >     illumos-developer
> >     Archives: https://www.listbox.com/__member/archive/182179/=now
> >     <https://www.listbox.com/member/archive/182179/=now>
> >     RSS Feed:
> >
> https://www.listbox.com/__member/archive/rss/182179/__21175174-cd73734d
> >     <https://www.listbox.com/member/archive/rss/182179/21175174-cd73734d
> >
> >     Modify Your Subscription:
> >     https://www.listbox.com/__member/?&id___secret=21175174-792643f6
> >     <https://www.listbox.com/member/?&;>
> >     Powered by Listbox: http://www.listbox.com
> >
> >
> > *illumos-developer* | Archives
> > <https://www.listbox.com/member/archive/182179/=now>
> > <https://www.listbox.com/member/archive/rss/182179/21501168-10807f51> |
> Modify
> > <
> https://www.listbox.com/member/?member_id=21501168&id_secret=21501168-469bb5e1
> >
> > Your Subscription     [Powered by Listbox] <http://www.listbox.com>
> >
>
>
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to