Hi Richard,
Can you please share the workarouds for specific configurations to speed
up the export time?
thanks.
在 2014/10/29 0:44, Richard Elling via illumos-zfs 写道:
Hi Chip,
On Oct 28, 2014, at 7:42 AM, Schweiss, Chip via illumos-zfs
<[email protected] <mailto:[email protected]>> wrote:
Andy,
Count me as interested.
Pool import time is a really problem in the HA setups I maintain.
While it may be HA, fail over events still need careful planning
because they take upwards of 10 minutes. Enough to cause
application timeouts, especially web applications.
Is the long time at import or export? If the latter, then there are
some workarounds for specific
configurations that can apply.
-- richard
Pool discovery with over 240 devices attached typically causes
service timeouts at boot up also.
Thank you for you efforts on this. Hopefully your work can make it
through to a release status.
By <= 32 cores does that include hyper-threaded cores? Obviously
with the new Haswell Xeon's even 32 full cores is possible on 2 CPU
systems now.
I'm currently building a new system that will have 24 cores + 24
hyper-threaded cores, so there may be issues. This system will be
in development mode until January, I'd be happy to test code for if
you need a test bed.
-Chip
On Tue, Oct 28, 2014 at 5:07 AM, Andy Stormont via illumos-developer
<[email protected] <mailto:[email protected]>> wrote:
The import code in libzfs could also use some work. If I
remember correctly the code that examines disk slices is O(n *
n). Though it’s broken it seems to work okay on machines with <=
32 cores but with more than that you’ll likely run into the
assert at the end of zpool_open_func - which happens when a
thread examining a slice realises another thread has already
marked the slice as not containing a pool after it’s started..
I had a go at rewriting the code earlier this year to fix those
issues which you can find here:
http://cr.illumos.org/~webrev/andy_js/zpool/
<http://cr.illumos.org/%7Ewebrev/andy_js/zpool/>
I’m not sure if this code works or even if it’s the latest
version but if there’s interest I can start looking at it again.
Andy.
On 28 Oct 2014, at 08:59, Arne Jansen via illumos-developer
<[email protected]
<mailto:[email protected]>> wrote:
Like this?
http://cr.illumos.org/~webrev/sensille/find_parallel_dp/
<http://cr.illumos.org/%7Ewebrev/sensille/find_parallel_dp/>
-Arne
On 10/16/2014 10:12 PM, Matthew Ahrens via illumos-developer wrote:
(resend to cc a few more lists -- sorry)
I think the overall idea of this change is sound -- and that's
a great
performance improvement.
Can you change dmu_objset_find_parallel() to work like
dmu_objset_find_dp(), and
then change zil_check_log_chain(0 and zil_claim() to work off
of the
dsl_dataset_t*, rather than the char*name? Then I think the
other threads won't
need to grab the namespace lock.
--matt
On Thu, Oct 16, 2014 at 12:05 PM, Arne Jansen via illumos-developer
<[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>> wrote:
In our setup zpool import takes a very long time (about 15
minutes) due
to a large count of filesystems (> 5000).
During this time, you can see the disks at only a few
percent busy in
iostat, while the pool is at 100%, which indicates a
single-threaded
import process.
What's causing this is checking of the zil chains for each
filesystem.
Each filesystem is visisted twice, once to check the chain
and once
to claim it. This is done in a sequential manner.
I've created a proof-of-concept patch that switches this
process to a
parallel enumeration using a taskq.
In my setup this speeds up the import process by a factor of 20,
bringing all disks (30 in my case) to nearly 100% busy.
There's only one problem: locking. spa_open is called with
spa_namespace_lock held. With this, later on
zil_check_log_chain and
zil_claim are called. These call in turn spa_open, which
tries to claim
spa_namespace_lock again. This problem is not new. The
current solution
looks like this:
/*
* As disgusting as this is, we need to support recursive
calls to this
* function because dsl_dir_open() is called during
spa_load(), and ends
* up calling spa_open() again. The real fix is to figure
out how to
* avoid dsl_dir_open() calling this in the first place.
*/
if (mutex_owner(&spa_namespace___lock) != curthread) {
mutex_enter(&spa_namespace___lock);
locked = B_TRUE;
}
This doesn't work anymore when I call
zil_claim/check_log_chain through
a taskq and thus from a different thread. My current hacky
solution is
to pass a flag through the call chain to spa_open to signal
that the
lock isn't needed. It works, but it doesn't look too nice...
My question now is how to build a cleaner solution. The
snippet above
is already ugly in itself, so it would be good to get rid of
it at all.
The functions involved in the call chains are all very
commonly used
functions. Changing the signature of those will probably
give a quite
bulky patch, so I'd prefer to find a nice small unintrusive way.
The patch currently looks like this:
http://cr.illumos.org/~webrev/__sensille/find_parallel/
<http://cr.illumos.org/%7Ewebrev/__sensille/find_parallel/>
<http://cr.illumos.org/~webrev/sensille/find_parallel/
<http://cr.illumos.org/%7Ewebrev/sensille/find_parallel/>>
Hopefully you can give me some hints on how to solve this
recursive
locking riddle...
-Arne
------------------------------__-------------
illumos-developer
Archives: https://www.listbox.com/__member/archive/182179/=now
<https://www.listbox.com/member/archive/182179/=now>
RSS Feed:
https://www.listbox.com/__member/archive/rss/182179/__21175174-cd73734d
<https://www.listbox.com/member/archive/rss/182179/21175174-cd73734d>
Modify Your Subscription:
https://www.listbox.com/__member/?&id___secret=21175174-792643f6
<https://www.listbox.com/member/?&>
Powered by Listbox: http://www.listbox.com
<http://www.listbox.com/>
*illumos-developer* | Archives
<https://www.listbox.com/member/archive/182179/=now>
<https://www.listbox.com/member/archive/rss/182179/21501168-10807f51>
| Modify
<https://www.listbox.com/member/?&>
Your Subscription[Powered by Listbox] <http://www.listbox.com
<http://www.listbox.com/>>
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed:
https://www.listbox.com/member/archive/rss/182179/21174975-30939194
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com <http://www.listbox.com/>
*illumos-developer* | Archives
<https://www.listbox.com/member/archive/182179/=now>
<https://www.listbox.com/member/archive/rss/182179/26619341-53733119>
| Modify <https://www.listbox.com/member/?&> Your Subscription
[Powered by Listbox] <http://www.listbox.com/>
*illumos-zfs* | Archives
<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/22820713-4fad4b89>
| Modify <https://www.listbox.com/member/?&> Your Subscription
[Powered by Listbox] <http://www.listbox.com/>
*illumos-zfs* | Archives
<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/25851491-12403091>
| Modify
<https://www.listbox.com/member/?member_id=25851491&id_secret=25851491-5c51399b>
Your Subscription [Powered by Listbox] <http://www.listbox.com>
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer