On 11/29/13, 10:33 PM, Alexander Motin wrote: > On 29.11.2013 23:36, Saso Kiselkov wrote: >> On 11/29/13, 6:40 PM, Alexander Motin wrote: >>> On 29.11.2013 19:04, Saso Kiselkov wrote: >>>> On 11/29/13, 3:00 PM, Alexander Motin wrote: >>>>> On 29.11.2013 16:49, Saso Kiselkov wrote: >>>>>> On 11/29/13, 2:31 PM, Alexander Motin wrote: >>>>>>> On 29.11.2013 16:16, Saso Kiselkov wrote: >>>>>>>> On 11/29/13, 2:00 PM, Alexander Motin wrote: >>>>>>>>> Hi. >>>>>>>>> >>>>>>>>> Running some benchmarks and profiles for backing block storage >>>>>>>>> (iSCSI, >>>>>>>>> etc) with ZFS files with recordsize=4K on 24-core FreeBSD machine >>>>>>>>> I've >>>>>>>>> noticed significant lock congestion on ZFS SPA config locks inside >>>>>>>>> spa_config_enter() and spa_config_exit(). The source of it is >>>>>>>>> numerous >>>>>>>>> bp_get_dsize() calls, which acquire and drop SCL_VDEV reader lock, >>>>>>>>> while >>>>>>>>> called for every written block. Even if the lock scope is very >>>>>>>>> small, so >>>>>>>>> many acquisitions predictably cause congestion. The more CPUs >>>>>>>>> system has >>>>>>>>> the heavier congestion is. And since these locks are adaptive on >>>>>>>>> FreeBSD >>>>>>>>> -- I am getting heavy lock spinning, burning up to half of the CPU >>>>>>>>> time. >>>>>>>>> >>>>>>>>> Is this problem known on other platforms? >>>>>>>>> >>>>>>>>> I've made a patch to replace mutex locks there with rwlocks, >>>>>>>>> acquiring >>>>>>>>> them for read in case SPA config lock is requested for read. On my >>>>>>>>> tests >>>>>>>>> this change doubles benchmark results and completely removes lock >>>>>>>>> congestion from SPA config locks since write acquisitions there >>>>>>>>> don't >>>>>>>>> happen so often. >>>>>>>>> >>>>>>>>> On FreeBSD, due to difference in memory allocation semantics, both >>>>>>>>> Solaris mutex and rwlock primitives are in fact emulated with the >>>>>>>>> same >>>>>>>>> sxlock primitive now, so my patch just uses functionality that is >>>>>>>>> there >>>>>>>>> any way. On illumos I suppose there is some difference, but I >>>>>>>>> guess >>>>>>>>> this >>>>>>>>> patch still should give benefits since these accesses indeed >>>>>>>>> can be >>>>>>>>> shared and that is what shared locks are for. >>>>>>>>> >>>>>>>>> Any comments about: >>>>>>>>> http://people.freebsd.org/~mav/spa_shared.patch ? >>>>>>>> >>>>>>>> How do you handle the cv_wait usage of scl_lock in spa_misc.c? Is >>>>>>>> FreeBSD agnostic to using a rwlock instead of a mutex in condition >>>>>>>> variable operations? >>>>>>> >>>>>>> Yes, cv_wait() is lock type agnostic on FreeBSD. >>>>>> >>>>>> Hm, taking a step back, I'm wondering if it wouldn't make sense to >>>>>> rewrite spa_config_enter/spa_config_exit completely in terms of >>>>>> rwlock_t's instead of doing our own "poor man's" implementation of >>>>>> them. >>>>>> As far as I can discern it, the semantics appear to be identical. >>>>> >>>>> AFAIK the main problem is that those SPA config locks may be >>>>> acquired by >>>>> one thread, while dropped by another. At least on FreeBSD system >>>>> locking >>>>> primitives don't allow that. >>>> >>>> This appears to work: http://37.153.99.61/spa_config_rwlock/ >>>> The only kind-of special requirement for this code is that >>>> rw_write_held >>>> test that the lock is held by not just any thread, but by curthread >>>> (mirroring mutex_held() in a sense). >>> >>> Then probably we need somebody who knows history of this code in Solaris >>> or detailed rwlocks semantics. From quick look on illumos code I >>> haven't found good reason why reader unlock from different thread may >>> not work (while for writer that is denied even in present code), but I >>> suppose there should be some good reason to implement things in this >>> custom way. >> >> The only "good" reason I can think of is read-write locks not being >> present in the kernel when this was written. Other than that, the >> spa_config_lock semantics are exactly the same as in krwlock_t. >> >>> Is there some kind of lock debugging facility in illumos kernel? >> >> You'll have to be a bit more specific. There's plenty of lock analysis >> and debugging tools, but that's really not what we're after here. We're >> after the semantics. > > I was talking about equivalent of FreeBSD WITTNESS kernel facility that > detects lock ordering violations and other debugging tools like that. > They may get upset if lock is not dropped on the same thread where > acquired. At least FreeBSD's one would definitely be.
Ah, understand. Well, I'm not sure. It is true that spa_config_locks are sometimes dropped in different threads from those where they were acquired. > Speaking about semantics, it may differ in some details. I may be wrong, > but I think that FreeBSD SX locks used for ZFS at the moment don't give > preference to the writer (at least I haven't found that), that is > critical for the case. Reader locks may overlap indefinitely in this > case and never allow writer to run if it is not explicitly enforced. If this is the case, then this is a serious bug in the FreeBSD read-write lock implementation and should be remedied irrespective of what we do here. >>> Are they agree to live with this too? >> >> If it improves performance and you can demonstrate that in testing (some >> info on the tests you ran + results and possibly nice graphs would be >> greatly appreciated), then I have no doubt Illumos will welcome this. >> After all, why not? > > I was talking about technical mechanisms, not people here. Then I'm not sure I understand your question. The patch works on Illumos - that's where I developed and tested it (though only quickly, I have to admit). That having been said, if this approach is not usable, then we certainly can't use your approach either, since condition variables on Illumos absolutely cannot work with rwlocks. Cheers, -- Saso _______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
