On May 21, 2012, at 2:58 PM, ext Thiago Macieira wrote: > On segunda-feira, 21 de maio de 2012 08.34.32, bradley.hug...@nokia.com wrote: >> On May 18, 2012, at 8:34 PM, ext Thiago Macieira wrote: >>> Recommendations (priority): >>> >>> (P0) de-inline QBasicMutex locking functions until we solve some or all of >>> the below problems >> >> I agree with this, so that it gives a chance to fix the performance >> regressions on Mac at a later date (since it probably won't be fixed before >> 5.0 is released). > > Some notes from the IRC discussion this morning between Olivier, Brad and > myself: > > * QMutex contended performance has dropped considerably on Mac from 4.8 to > 5.0 (it's 10x slower) > * QMutex contended performance on Mac is now actually similar to the > pthread_mutex_t performance (read: contended QMutex on 4.8 is 10x faster than > pthread_mutex_t) > * changing the QMutex implementation to use the generic Unix codepath on Mac > makes it 2x slower > * the non-Linux code in QBasicMutex::lockInternal is considered complex and > hard to read by both Brad and myself > > Brad: could you please provide what is, to the best of your knowledge today, > the combination of tricks that made 4.8 fast?
The trick was the adaptive spin, added and modified over a series of commits in 4.8. The biggest gain was on Mac, Linux performance didn't change noticibly, Windows did get a small gain too (as far as I recall). > * QMutex de-inlining and the Mac performance issues are orthogonal. > * QMutex "de-inlining" should be understood more correctly as: removing the > testAndSet calls from the inline functions. The inline functions should > remain > inline. > * The de-inlining is important for Valgrind (helgrind / DRD) to work > properly, even in release mode Lars and I had a conversation in the hallway about how QMutex performance on Windows. It's been a while since I last tested, but I recall that QMutex didn't out perform CRITICAL_SECTIONs. De-inlining is necessary so that we can make QMutex nothing more than a wrapper around CRITICAL_SECTION (since the latter performs better). So far, we've got 3 votes for de-inlining: Thiago, Lars, and myself. For the few cases where inlining matters, we can inline inside Qt at those locations (QMetaObject::activate() would be the first place to check). > Note that there's another trick that QMutex can apply under valgrind but > QBasicMutex cannot: if the QMutex constructor initialises the d pointer to > anything non-null and different from 3, the inlined testAndSet will fail, so > valgrind can properly hijack the lock and unlock functions. > > -- > Thiago Macieira - thiago.macieira (AT) intel.com > Software Architect - Intel Open Source Technology Center > Intel Sweden AB - Registration Number: 556189-6027 > Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden > _______________________________________________ > Development mailing list > Development@qt-project.org > http://lists.qt-project.org/mailman/listinfo/development -- Bradley T. Hughes bradley.hug...@nokia.com _______________________________________________ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development