On 11/29/13, 6:40 PM, Alexander Motin wrote:
> On 29.11.2013 19:04, Saso Kiselkov wrote:
>> On 11/29/13, 3:00 PM, Alexander Motin wrote:
>>> On 29.11.2013 16:49, Saso Kiselkov wrote:
>>>> On 11/29/13, 2:31 PM, Alexander Motin wrote:
>>>>> On 29.11.2013 16:16, Saso Kiselkov wrote:
>>>>>> On 11/29/13, 2:00 PM, Alexander Motin wrote:
>>>>>>> Hi.
>>>>>>>
>>>>>>> Running some benchmarks and profiles for backing block storage
>>>>>>> (iSCSI,
>>>>>>> etc) with ZFS files with recordsize=4K on 24-core FreeBSD machine
>>>>>>> I've
>>>>>>> noticed significant lock congestion on ZFS SPA config locks inside
>>>>>>> spa_config_enter() and spa_config_exit().  The source of it is
>>>>>>> numerous
>>>>>>> bp_get_dsize() calls, which acquire and drop SCL_VDEV reader lock,
>>>>>>> while
>>>>>>> called for every written block. Even if the lock scope is very
>>>>>>> small, so
>>>>>>> many acquisitions predictably cause congestion. The more CPUs
>>>>>>> system has
>>>>>>> the heavier congestion is. And since these locks are adaptive on
>>>>>>> FreeBSD
>>>>>>> -- I am getting heavy lock spinning, burning up to half of the CPU
>>>>>>> time.
>>>>>>>
>>>>>>> Is this problem known on other platforms?
>>>>>>>
>>>>>>> I've made a patch to replace mutex locks there with rwlocks,
>>>>>>> acquiring
>>>>>>> them for read in case SPA config lock is requested for read. On my
>>>>>>> tests
>>>>>>> this change doubles benchmark results and completely removes lock
>>>>>>> congestion from SPA config locks since write acquisitions there
>>>>>>> don't
>>>>>>> happen so often.
>>>>>>>
>>>>>>> On FreeBSD, due to difference in memory allocation semantics, both
>>>>>>> Solaris mutex and rwlock primitives are in fact emulated with the
>>>>>>> same
>>>>>>> sxlock primitive now, so my patch just uses functionality that is
>>>>>>> there
>>>>>>> any way. On illumos I suppose there is some difference, but I guess
>>>>>>> this
>>>>>>> patch still should give benefits since these accesses indeed can be
>>>>>>> shared and that is what shared locks are for.
>>>>>>>
>>>>>>> Any comments about:
>>>>>>> http://people.freebsd.org/~mav/spa_shared.patch ?
>>>>>>
>>>>>> How do you handle the cv_wait usage of scl_lock in spa_misc.c? Is
>>>>>> FreeBSD agnostic to using a rwlock instead of a mutex in condition
>>>>>> variable operations?
>>>>>
>>>>> Yes, cv_wait() is lock type agnostic on FreeBSD.
>>>>
>>>> Hm, taking a step back, I'm wondering if it wouldn't make sense to
>>>> rewrite spa_config_enter/spa_config_exit completely in terms of
>>>> rwlock_t's instead of doing our own "poor man's" implementation of
>>>> them.
>>>> As far as I can discern it, the semantics appear to be identical.
>>>
>>> AFAIK the main problem is that those SPA config locks may be acquired by
>>> one thread, while dropped by another. At least on FreeBSD system locking
>>> primitives don't allow that.
>>
>> This appears to work: http://37.153.99.61/spa_config_rwlock/
>> The only kind-of special requirement for this code is that rw_write_held
>> test that the lock is held by not just any thread, but by curthread
>> (mirroring mutex_held() in a sense).
> 
> Then probably we need somebody who knows history of this code in Solaris
> or detailed rwlocks semantics.  From quick look on illumos code I
> haven't found good reason why reader unlock from different thread may
> not work (while for writer that is denied even in present code), but I
> suppose there should be some good reason to implement things in this
> custom way.

The only "good" reason I can think of is read-write locks not being
present in the kernel when this was written. Other than that, the
spa_config_lock semantics are exactly the same as in krwlock_t.

> Is there some kind of lock debugging facility in illumos kernel?

You'll have to be a bit more specific. There's plenty of lock analysis
and debugging tools, but that's really not what we're after here. We're
after the semantics. I think I understand the krwlock_t semantics from
looking at the code (and they appear identical to FreeBSD's, though I
haven't looked at Linux), and as far as I can discern, the code in that
webrev gives the exact same behavior as the legacy spa_config_locks
implementation.

> Are they agree to live with this too?

If it improves performance and you can demonstrate that in testing (some
info on the tests you ran + results and possibly nice graphs would be
greatly appreciated), then I have no doubt Illumos will welcome this.
After all, why not?

Cheers,
-- 
Saso
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to