On Fri, Oct 31, 2014 at 2:24 AM, Arne Jansen <[email protected]> wrote:

> Ok, we're getting closer. Next version still at:
>
> http://cr.illumos.org/~webrev/sensille/find_parallel_dp/
>
> On 10/29/2014 03:24 AM, Matthew Ahrens wrote:
> > dmu_objset_find_parallel_impl - it might make sense to call the callback
> before
> > doing taskq_dispatch, that way we can say that the traversal is pre-order
> > (callback on parent is executed before callbacks on children).
>
> This would put the callback in the hot path and could slow down the
> process a
> bit. I'd rather only do it if someone needs it.
>

> >
> > It would be really great to extend this to be a replacement for
> > dmu_objset_find_dp(), perhaps with a flag for single-threaded callbacks
> (i.e.
> > create the taskq with just one thread).  I took a look at all the
> consumers of
> > dmu_objset_find_dp(), and I think they would all work fine with pre-order
> > iteration (as you're doing) rather than post-order.
>
> Done. Again, I left out the single-thread flag until someone needs it.
> Unfortunately I have no good way to test this. In my previous
> implementation
> I was quite confident that it works fine, but I didn't test the other
> consumers
> of dmu_objset_find_dp. Maybe I have to take a look at the testsuite...
>

We will definitely need to test all of the callers.  Unless they are
modified, some of the callers will require single-threaded execution of
their callbacks (i.e. create taskq with one thread).

--matt


>
> -Arne
>
>
> >
> > vdev_count_leaves_impl - can this be "static"?
> >
> > zil.c lines 639, 698: incorrect indentation (should be 2 tabs + 4 spaces)
> >
> > It's conventional to use %llu and then cast the argument to "unsigned
> long
> > long", as opposed to using PRIu64.
> >
> > --matt
> >
> >
> >
> > On Tue, Oct 28, 2014 at 1:59 AM, Arne Jansen <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Like this?
> >
> >     http://cr.illumos.org/~webrev/sensille/find_parallel_dp/
> >
> >     -Arne
> >
> >     On 10/16/2014 10:12 PM, Matthew Ahrens via illumos-developer wrote:
> >     > (resend to cc a few more lists -- sorry)
> >     >
> >     > I think the overall idea of this change is sound -- and that's a
> great
> >     > performance improvement.
> >     >
> >     > Can you change dmu_objset_find_parallel() to work like
> dmu_objset_find_dp(), and
> >     > then change zil_check_log_chain(0 and zil_claim() to work off of
> the
> >     > dsl_dataset_t*, rather than the char*name?  Then I think the other
> threads won't
> >     > need to grab the namespace lock.
> >     >
> >     > --matt
> >     >
> >     > On Thu, Oct 16, 2014 at 12:05 PM, Arne Jansen via illumos-developer
> >     > <[email protected] <mailto:[email protected]>
> >     <mailto:[email protected] <mailto:
> [email protected]>>>
> >     wrote:
> >     >
> >     >     In our setup zpool import takes a very long time (about 15
> minutes) due
> >     >     to a large count of filesystems (> 5000).
> >     >     During this time, you can see the disks at only a few percent
> busy in
> >     >     iostat, while the pool is at 100%, which indicates a
> single-threaded
> >     >     import process.
> >     >     What's causing this is checking of the zil chains for each
> filesystem.
> >     >     Each filesystem is visisted twice, once to check the chain and
> once
> >     >     to claim it. This is done in a sequential manner.
> >     >     I've created a proof-of-concept patch that switches this
> process to a
> >     >     parallel enumeration using a taskq.
> >     >     In my setup this speeds up the import process by a factor of
> 20,
> >     >     bringing all disks (30 in my case) to nearly 100% busy.
> >     >     There's only one problem: locking. spa_open is called with
> >     >     spa_namespace_lock held. With this, later on
> zil_check_log_chain and
> >     >     zil_claim are called. These call in turn spa_open, which tries
> to claim
> >     >     spa_namespace_lock again. This problem is not new. The current
> solution
> >     >     looks like this:
> >     >
> >     >     /*
> >     >      * As disgusting as this is, we need to support recursive
> calls to this
> >     >      * function because dsl_dir_open() is called during
> spa_load(), and ends
> >     >      * up calling spa_open() again.  The real fix is to figure out
> how to
> >     >      * avoid dsl_dir_open() calling this in the first place.
> >     >      */
> >     >     if (mutex_owner(&spa_namespace___lock) != curthread) {
> >     >             mutex_enter(&spa_namespace___lock);
> >     >             locked = B_TRUE;
> >     >     }
> >     >
> >     >     This doesn't work anymore when I call
> zil_claim/check_log_chain through
> >     >     a taskq and thus from a different thread. My current hacky
> solution is
> >     >     to pass a flag through the call chain to spa_open to signal
> that the
> >     >     lock isn't needed. It works, but it doesn't look too nice...
> >     >
> >     >     My question now is how to build a cleaner solution. The
> snippet above
> >     >     is already ugly in itself, so it would be good to get rid of
> it at all.
> >     >     The functions involved in the call chains are all very
> commonly used
> >     >     functions. Changing the signature of those will probably give
> a quite
> >     >     bulky patch, so I'd prefer to find a nice small unintrusive
> way.
> >     >
> >     >     The patch currently looks like this:
> >     >
> >     >     http://cr.illumos.org/~webrev/__sensille/find_parallel/
> >     >     <http://cr.illumos.org/~webrev/sensille/find_parallel/>
> >     >
> >     >     Hopefully you can give me some hints on how to solve this
> recursive
> >     >     locking riddle...
> >     >
> >     >     -Arne
> >     >
> >     >
> >     >
> >     >     ------------------------------__-------------
> >     >     illumos-developer
> >     >     Archives: https://www.listbox.com/__member/archive/182179/=now
> >     >     <https://www.listbox.com/member/archive/182179/=now>
> >     >     RSS Feed:
> >     >
> https://www.listbox.com/__member/archive/rss/182179/__21175174-cd73734d
> >     >     <
> https://www.listbox.com/member/archive/rss/182179/21175174-cd73734d>
> >     >     Modify Your Subscription:
> >     >
> https://www.listbox.com/__member/?&id___secret=21175174-792643f6
> >     >     <https://www.listbox.com/member/?&;>
> >     >     Powered by Listbox: http://www.listbox.com
> >     >
> >     >
> >     > *illumos-developer* | Archives
> >     > <https://www.listbox.com/member/archive/182179/=now>
> >     > <
> https://www.listbox.com/member/archive/rss/182179/21501168-10807f51> |
> Modify
> >     >
> >     <
> https://www.listbox.com/member/?member_id=21501168&id_secret=21501168-469bb5e1
> >
> >     > Your Subscription     [Powered by Listbox] <http://www.listbox.com
> >
> >     >
> >
> >
>
>
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to