(resend to cc a few more lists -- sorry)

I think the overall idea of this change is sound -- and that's a great
performance improvement.

Can you change dmu_objset_find_parallel() to work like
dmu_objset_find_dp(), and then change zil_check_log_chain(0 and zil_claim()
to work off of the dsl_dataset_t*, rather than the char*name?  Then I think
the other threads won't need to grab the namespace lock.

--matt

On Thu, Oct 16, 2014 at 12:05 PM, Arne Jansen via illumos-developer <
[email protected]> wrote:

> In our setup zpool import takes a very long time (about 15 minutes) due
> to a large count of filesystems (> 5000).
> During this time, you can see the disks at only a few percent busy in
> iostat, while the pool is at 100%, which indicates a single-threaded
> import process.
> What's causing this is checking of the zil chains for each filesystem.
> Each filesystem is visisted twice, once to check the chain and once
> to claim it. This is done in a sequential manner.
> I've created a proof-of-concept patch that switches this process to a
> parallel enumeration using a taskq.
> In my setup this speeds up the import process by a factor of 20,
> bringing all disks (30 in my case) to nearly 100% busy.
> There's only one problem: locking. spa_open is called with
> spa_namespace_lock held. With this, later on zil_check_log_chain and
> zil_claim are called. These call in turn spa_open, which tries to claim
> spa_namespace_lock again. This problem is not new. The current solution
> looks like this:
>
> /*
>  * As disgusting as this is, we need to support recursive calls to this
>  * function because dsl_dir_open() is called during spa_load(), and ends
>  * up calling spa_open() again.  The real fix is to figure out how to
>  * avoid dsl_dir_open() calling this in the first place.
>  */
> if (mutex_owner(&spa_namespace_lock) != curthread) {
>         mutex_enter(&spa_namespace_lock);
>         locked = B_TRUE;
> }
>
> This doesn't work anymore when I call zil_claim/check_log_chain through
> a taskq and thus from a different thread. My current hacky solution is
> to pass a flag through the call chain to spa_open to signal that the
> lock isn't needed. It works, but it doesn't look too nice...
>
> My question now is how to build a cleaner solution. The snippet above
> is already ugly in itself, so it would be good to get rid of it at all.
> The functions involved in the call chains are all very commonly used
> functions. Changing the signature of those will probably give a quite
> bulky patch, so I'd prefer to find a nice small unintrusive way.
>
> The patch currently looks like this:
>
> http://cr.illumos.org/~webrev/sensille/find_parallel/
>
> Hopefully you can give me some hints on how to solve this recursive
> locking riddle...
>
> -Arne
>
>
>
> -------------------------------------------
> illumos-developer
> Archives: https://www.listbox.com/member/archive/182179/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182179/
> 21175174-cd73734d
> Modify Your Subscription: https://www.listbox.com/
> member/?member_id=21175174&id_secret=21175174-792643f6
> Powered by Listbox: http://www.listbox.com
>
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to