Re: [dev] x86 osl/interlck.h performance

Jens-Heiner Rechtien Wed, 26 Apr 2006 10:18:49 -0700

Hi,

I did some measurements with a copy of SRC680 m164 and one of the morepathological calc documents, and found that the "lock" prefix indeedimposes a significant overhead of about 8% on a non HT 1.8 GHz Pentium IV.

(The tests included starting StarOffice, loading the document andclosing the application as soon as the document is loaded).


$ time ./soffice numbers_large.ods
With "lock":          w/o "lock"
user time: 41.474s    38.379s
user time: 41.611s    38.676s
user time: 41.796s    38.397s
user time: 41.623s    38.412s
user time: 41.696s    38.742s

mean:      41.64s     38.52s

Comparing the wall clock times showed essentially the same value of 8%overhead for the "lock" case.


Heiner


Stephan Bergmann wrote:

Hi all,
Someone recently mentioned that osl_increment/decrementInterlockedCountwould show up as top scorers with certain profiling tools (vtune?). Thatgot me thinking. On both Linux x86 and Windows x86, those functions areimplemented in assembler, effectively consisting of a LOCK-prefixedXADD. Now, I thought that, at least on a uniprocessor machine, the LOCKwould probably not be that expensive, but that the profiling tool inquestion might be confused by it and present bogus results.
However, the following little program on Linux x86 (where incLocked is acopy of osl_incrementInterlockedCount, and incUnlocked is the same,without the LOCK prefix) told a different story:
  // lock.c
  #include <stdio.h>
  int incLocked(int * p) {
    int n;
    __asm__ __volatile__ (
      "movl $1, %0\n\t"
      "lock\n\t"
      "xaddl %0, %2\n\t"
      "incl %0" :
      "=&r" (n), "=m" (*p) :
      "m" (*p) :
      "memory");
    return n;
  }
  int incUnlocked(int * p) {
    int n;
    __asm__ __volatile__ (
      "movl $1, %0\n\t"
      "xaddl %0, %2\n\t"
      "incl %0" :
      "=&r" (n), "=m" (*p) :
      "m" (*p) :
      "memory");
    return n;
  }
  int main(int argc, char ** argv) {
    int i;
    int n = 0;
    if (argv[1][0] == 'l') {
      puts("locked version");
      for (i = 0; i < 100000000; ++i) {
        incLocked(&n);
      }
    } else {
      puts("unlocked version");
      for (i = 0; i < 100000000; ++i) {
        incUnlocked(&n);
      }
    }
    return 0;
  }

m1> cat /proc/cpuinfo
  processor : 0
  model name: Intel(R) Pentium(R) 4 CPU 1.80GHz
  ...
m1> time ./lock l
  locked version
  11.868u 0.000s 0:12.19 97.2%  0+0k 0+0io 0pf+0w
m1> time ./lock u
  unlocked version
  1.516u 0.000s 0:01.57 96.1%  0+0k 0+0io 0pf+0w

m2> cat /proc/cpuinfo
  processor : 0
  model name: AMD Opteron(tm) Processor 242
  processor : 1
  model name: AMD Opteron(tm) Processor 242
  ...
m2> time ./lock l
  locked version
  1.863u 0.000s 0:01.86 100.0%  0+0k 0+0io 0pf+0w
m2> time ./lock u
  unlocked version
  0.886u 0.000s 0:00.89 98.8%  0+0k 0+0io 0pf+0w
So, depending on CPU type, the version with LOCK is 2--8 times slowerthan the version without LOCK. Would be interesting to see whether thishas any actual impact on overall OOo performance. (But first, I'm offon vacation...)
-Stephan

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
Jens-Heiner Rechtien
[EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [dev] x86 osl/interlck.h performance

Reply via email to