On 29.11.2013 23:36, Saso Kiselkov wrote:
On 11/29/13, 6:40 PM, Alexander Motin wrote:
On 29.11.2013 19:04, Saso Kiselkov wrote:
On 11/29/13, 3:00 PM, Alexander Motin wrote:
On 29.11.2013 16:49, Saso Kiselkov wrote:
On 11/29/13, 2:31 PM, Alexander Motin wrote:
On 29.11.2013 16:16, Saso Kiselkov wrote:
On 11/29/13, 2:00 PM, Alexander Motin wrote:
Hi.

Running some benchmarks and profiles for backing block storage
(iSCSI,
etc) with ZFS files with recordsize=4K on 24-core FreeBSD machine
I've
noticed significant lock congestion on ZFS SPA config locks inside
spa_config_enter() and spa_config_exit().  The source of it is
numerous
bp_get_dsize() calls, which acquire and drop SCL_VDEV reader lock,
while
called for every written block. Even if the lock scope is very
small, so
many acquisitions predictably cause congestion. The more CPUs
system has
the heavier congestion is. And since these locks are adaptive on
FreeBSD
-- I am getting heavy lock spinning, burning up to half of the CPU
time.

Is this problem known on other platforms?

I've made a patch to replace mutex locks there with rwlocks,
acquiring
them for read in case SPA config lock is requested for read. On my
tests
this change doubles benchmark results and completely removes lock
congestion from SPA config locks since write acquisitions there
don't
happen so often.

On FreeBSD, due to difference in memory allocation semantics, both
Solaris mutex and rwlock primitives are in fact emulated with the
same
sxlock primitive now, so my patch just uses functionality that is
there
any way. On illumos I suppose there is some difference, but I guess
this
patch still should give benefits since these accesses indeed can be
shared and that is what shared locks are for.

Any comments about:
http://people.freebsd.org/~mav/spa_shared.patch ?

How do you handle the cv_wait usage of scl_lock in spa_misc.c? Is
FreeBSD agnostic to using a rwlock instead of a mutex in condition
variable operations?

Yes, cv_wait() is lock type agnostic on FreeBSD.

Hm, taking a step back, I'm wondering if it wouldn't make sense to
rewrite spa_config_enter/spa_config_exit completely in terms of
rwlock_t's instead of doing our own "poor man's" implementation of
them.
As far as I can discern it, the semantics appear to be identical.

AFAIK the main problem is that those SPA config locks may be acquired by
one thread, while dropped by another. At least on FreeBSD system locking
primitives don't allow that.

This appears to work: http://37.153.99.61/spa_config_rwlock/
The only kind-of special requirement for this code is that rw_write_held
test that the lock is held by not just any thread, but by curthread
(mirroring mutex_held() in a sense).

Then probably we need somebody who knows history of this code in Solaris
or detailed rwlocks semantics.  From quick look on illumos code I
haven't found good reason why reader unlock from different thread may
not work (while for writer that is denied even in present code), but I
suppose there should be some good reason to implement things in this
custom way.

The only "good" reason I can think of is read-write locks not being
present in the kernel when this was written. Other than that, the
spa_config_lock semantics are exactly the same as in krwlock_t.

Is there some kind of lock debugging facility in illumos kernel?

You'll have to be a bit more specific. There's plenty of lock analysis
and debugging tools, but that's really not what we're after here. We're
after the semantics.

I was talking about equivalent of FreeBSD WITTNESS kernel facility that detects lock ordering violations and other debugging tools like that. They may get upset if lock is not dropped on the same thread where acquired. At least FreeBSD's one would definitely be.

I think I understand the krwlock_t semantics from
looking at the code (and they appear identical to FreeBSD's, though I
haven't looked at Linux), and as far as I can discern, the code in that
webrev gives the exact same behavior as the legacy spa_config_locks
implementation.

Speaking about semantics, it may differ in some details. I may be wrong, but I think that FreeBSD SX locks used for ZFS at the moment don't give preference to the writer (at least I haven't found that), that is critical for the case. Reader locks may overlap indefinitely in this case and never allow writer to run if it is not explicitly enforced.

Are they agree to live with this too?

If it improves performance and you can demonstrate that in testing (some
info on the tests you ran + results and possibly nice graphs would be
greatly appreciated), then I have no doubt Illumos will welcome this.
After all, why not?

I was talking about technical mechanisms, not people here.

--
Alexander Motin
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to