Hello

I've just done some benchmarking of QMutex on Linux, using the pthread
implementation instead of the futex one.

Conclusions first:

QMutex is optimised for uncontended case. It does that by keeping the d
pointer at NULL while unlocked, and uses 0x3 to indicate it's locked. Changing
from one value to another is extremely quick, requiring a simple atomic
operation. QMutex when uncontended proves to be roughly 16% faster than
pthread. This also shows in the benchmarks that use non-zero msleep: the mutex
is mostly uncontended.

That comes at a price, though: the performance drops considerably when
contention happens.

When contention happens at a low rate (the "msleep(0)" case), QMutex
performance is similar to that of pthread, though slightly worse (up to 5%).

When contention happens a lot, the performance is awful. I've measured
anything from 100% slower to over 1000%.

Extrapolating these results to Mac and Windows, I expect QMutex performance in
uncontended to be *much* better, but still lose horribly in the contended
case.

Conclusion: I'm glad I use Linux and that we have futex.

DATA:

Reference:
Intel i7-2620M (SandyBridge)
        2 cores x 2 threads, 2.7 GHz, turbo to 3.3 GHz
        CPU in "performance" governor
Linux 3.5.2
glibc 2.15
Fedora 17
GCC 4.7.1, 64-bit mode
QtCore linked with LTO

All results are the best out of 6 runs, under realtime FIFO scheduling.

Uncontended Mutex results (100 million iterations):

RESULT : tst_QMutex::uncontendedNative():
     60.5891925 CPU ticks per iteration
        450.189192 task-clock                #    0.999 CPUs utilized
     1,511,489,291 cycles                    #    3.357 GHz
     1,306,287,711 instructions              #    0.86  insns per cycle
               197 raw_syscalls:sys_enter    #    0.438 K/sec
       0.450477229 seconds time elapsed

RESULT : tst_QMutex::uncontendedQMutex():
     50.7105596 CPU ticks per iteration
        379.784144 task-clock                #    0.999 CPUs utilized
     1,268,507,621 cycles                    #    3.340 GHz
       745,975,928 instructions              #    0.59  insns per cycle
               194 raw_syscalls:sys_enter    #    0.511 K/sec
       0.380036271 seconds time elapsed

Contended Mutex results (1000 iterations):

RESULT : tst_QMutex::contendedNative():"no msleep, 1 mutex":
     2,052,212.507 CPU ticks per iteration
       5814.825257 task-clock                #    3.797 CPUs utilized
    18,513,286,444 cycles                    #    3.184 GHz
    13,801,932,519 instructions              #    0.75  insns per cycle
         8,609,051 raw_syscalls:sys_enter    #    1.481 M/sec
       1.531495948 seconds time elapsed

RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex":
     4,087,893.432 CPU ticks per iteration
      11037.507260 task-clock                #    2.699 CPUs utilized
    33,483,481,790 cycles                    #    3.034 GHz
    21,436,137,659 instructions              #    0.64  insns per cycle
        12,012,804 raw_syscalls:sys_enter    #    1.088 M/sec
       4.088957193 seconds time elapsed

Other results were: 4.2, 5.7, 5.8, 6.7, 7.1 million ticks.

RESULT : tst_QMutex::contendedNative():"no msleep, 2 mutexes":
     2,550,929.603 CPU ticks per iteration
       7155.513345 task-clock                #    3.763 CPUs utilized
    22,760,839,897 cycles                    #    3.181 GHz
    16,370,712,299 instructions              #    0.72  insns per cycle
        10,457,934 raw_syscalls:sys_enter    #    1.462 M/sec
       1.901400808 seconds time elapsed

RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes":
     29,396,174.807 CPU ticks per iteration
      48627.618792 task-clock                #    2.183 CPUs utilized
   141,749,504,525 cycles                    #    2.915 GHz
    78,008,558,700 instructions              #    0.55  insns per cycle
        38,536,844 raw_syscalls:sys_enter    #    0.792 M/sec
      22.271697343 seconds time elapsed

100 iterations:
RESULT : tst_QMutex::contendedNative():"msleep(0), 1 mutex":
     67,621,168.46 CPU ticks per iteration
       4326.998212 task-clock                #    0.859 CPUs utilized
    11,239,050,634 cycles                    #    2.597 GHz
     8,415,799,134 instructions              #    0.75  insns per cycle
         2,965,384 raw_syscalls:sys_enter    #    0.685 M/sec
       5.036652093 seconds time elapsed

RESULT : tst_QMutex::contendedQMutex():"msleep(0), 1 mutex":
     70,621,368.59 CPU ticks per iteration
       4909.514006 task-clock                #    0.934 CPUs utilized
    13,123,468,429 cycles                    #    2.673 GHz
     9,532,793,349 instructions              #    0.73  insns per cycle
         3,619,607 raw_syscalls:sys_enter    #    0.737 M/sec
       5.253921952 seconds time elapsed

RESULT : tst_QMutex::contendedNative():"msleep(0), 2 mutexes":
     67,478,669.37 CPU ticks per iteration
       4314.232114 task-clock                #    0.857 CPUs utilized
    11,244,572,017 cycles                    #    2.606 GHz
     8,382,057,867 instructions              #    0.75  insns per cycle
         2,939,351 raw_syscalls:sys_enter    #    0.681 M/sec
       5.035212837 seconds time elapsed

RESULT : tst_QMutex::contendedQMutex():"msleep(0), 2 mutexes":
     70,837,078.76 CPU ticks per iteration
       4933.702732 task-clock                #    0.929 CPUs utilized
    13,192,133,179 cycles                    #    2.674 GHz
     9,554,807,698 instructions              #    0.72  insns per cycle
         3,622,623 raw_syscalls:sys_enter    #    0.734 M/sec
       5.309986829 seconds time elapsed

--
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Development mailing list
[email protected]
http://lists.qt-project.org/mailman/listinfo/development

Reply via email to