Is the implementation of Illumos user-space locks known to take an
abnormally long time on the Opteron 6200 ("Interlagos", "Bulldozer")
CPUs?
The reason why I ask is that 'prstat -L -m' shows that a massive
amount of my OpenMP-threaded application's time is being spent in
user-space locks. For example, 30% to 90%. When time spent in locks
is indicated to be high, then the performance is terrible.
While I am running a benchmark and there is just one OpenMP thread
allowed to be running, I am seeing as much as 25% of the time being
spent in locks.
These issues are being noticed when there is quite a lot of data to
process and therefore quite a long time between acquiring locks.
Locks should not be held more than a few microseconds.
Performing the same test on a quad-core Opteron system running Solaris
10 reports about 0.1% of the time being spent in locks. With four
cores enabled on the Opteron 6282 SE, about 17% of the time is
reported to be spent in locks, and this is for a "good" algorithm.
On the quad core Opteron system, the Dtrace Toolkit's lockbyproc.d
shows that my program spent 95,166 nanoseconds (0.00009516s) in locks
for a test duration of 33.16s. This is what lockbydist.d says about
the time distribution of the locks:
gm64Q16
value ------------- Distribution ------------- count
8192 | 0
16384 |@@@@@@@@@@@@@@ 5
32768 |@@@@@@@@@@@ 4
65536 |@@@@@@@@@@@@@@ 5
131072 | 0
My application seems to be properly designed.
Ideas?
Bob
--
Bob Friesenhahn
[email protected], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription:
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com