On 11/29/13, 6:40 PM, Alexander Motin wrote: > On 29.11.2013 19:04, Saso Kiselkov wrote: >> On 11/29/13, 3:00 PM, Alexander Motin wrote: >>> On 29.11.2013 16:49, Saso Kiselkov wrote: >>>> On 11/29/13, 2:31 PM, Alexander Motin wrote: >>>>> On 29.11.2013 16:16, Saso Kiselkov wrote: >>>>>> On 11/29/13, 2:00 PM, Alexander Motin wrote: >>>>>>> Hi. >>>>>>> >>>>>>> Running some benchmarks and profiles for backing block storage >>>>>>> (iSCSI, >>>>>>> etc) with ZFS files with recordsize=4K on 24-core FreeBSD machine >>>>>>> I've >>>>>>> noticed significant lock congestion on ZFS SPA config locks inside >>>>>>> spa_config_enter() and spa_config_exit(). The source of it is >>>>>>> numerous >>>>>>> bp_get_dsize() calls, which acquire and drop SCL_VDEV reader lock, >>>>>>> while >>>>>>> called for every written block. Even if the lock scope is very >>>>>>> small, so >>>>>>> many acquisitions predictably cause congestion. The more CPUs >>>>>>> system has >>>>>>> the heavier congestion is. And since these locks are adaptive on >>>>>>> FreeBSD >>>>>>> -- I am getting heavy lock spinning, burning up to half of the CPU >>>>>>> time. >>>>>>> >>>>>>> Is this problem known on other platforms? >>>>>>> >>>>>>> I've made a patch to replace mutex locks there with rwlocks, >>>>>>> acquiring >>>>>>> them for read in case SPA config lock is requested for read. On my >>>>>>> tests >>>>>>> this change doubles benchmark results and completely removes lock >>>>>>> congestion from SPA config locks since write acquisitions there >>>>>>> don't >>>>>>> happen so often. >>>>>>> >>>>>>> On FreeBSD, due to difference in memory allocation semantics, both >>>>>>> Solaris mutex and rwlock primitives are in fact emulated with the >>>>>>> same >>>>>>> sxlock primitive now, so my patch just uses functionality that is >>>>>>> there >>>>>>> any way. On illumos I suppose there is some difference, but I guess >>>>>>> this >>>>>>> patch still should give benefits since these accesses indeed can be >>>>>>> shared and that is what shared locks are for. >>>>>>> >>>>>>> Any comments about: >>>>>>> http://people.freebsd.org/~mav/spa_shared.patch ? >>>>>> >>>>>> How do you handle the cv_wait usage of scl_lock in spa_misc.c? Is >>>>>> FreeBSD agnostic to using a rwlock instead of a mutex in condition >>>>>> variable operations? >>>>> >>>>> Yes, cv_wait() is lock type agnostic on FreeBSD. >>>> >>>> Hm, taking a step back, I'm wondering if it wouldn't make sense to >>>> rewrite spa_config_enter/spa_config_exit completely in terms of >>>> rwlock_t's instead of doing our own "poor man's" implementation of >>>> them. >>>> As far as I can discern it, the semantics appear to be identical. >>> >>> AFAIK the main problem is that those SPA config locks may be acquired by >>> one thread, while dropped by another. At least on FreeBSD system locking >>> primitives don't allow that. >> >> This appears to work: http://37.153.99.61/spa_config_rwlock/ >> The only kind-of special requirement for this code is that rw_write_held >> test that the lock is held by not just any thread, but by curthread >> (mirroring mutex_held() in a sense). > > Then probably we need somebody who knows history of this code in Solaris > or detailed rwlocks semantics. From quick look on illumos code I > haven't found good reason why reader unlock from different thread may > not work (while for writer that is denied even in present code), but I > suppose there should be some good reason to implement things in this > custom way.
The only "good" reason I can think of is read-write locks not being present in the kernel when this was written. Other than that, the spa_config_lock semantics are exactly the same as in krwlock_t. > Is there some kind of lock debugging facility in illumos kernel? You'll have to be a bit more specific. There's plenty of lock analysis and debugging tools, but that's really not what we're after here. We're after the semantics. I think I understand the krwlock_t semantics from looking at the code (and they appear identical to FreeBSD's, though I haven't looked at Linux), and as far as I can discern, the code in that webrev gives the exact same behavior as the legacy spa_config_locks implementation. > Are they agree to live with this too? If it improves performance and you can demonstrate that in testing (some info on the tests you ran + results and possibly nice graphs would be greatly appreciated), then I have no doubt Illumos will welcome this. After all, why not? Cheers, -- Saso _______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
