Re: [discuss] User-space locks slow on Opteron 6200?

Bob Friesenhahn Thu, 12 Apr 2012 09:00:57 -0700

On Thu, 12 Apr 2012, Hans Rosenfeld wrote:

On Thu, Apr 12, 2012 at 09:39:23AM -0500, Bob Friesenhahn wrote:

My OpenMP-based application definitely fits the description of a
potentially "problematic application" because it does execute the same
code in tight loops in both cores of a compute unit.  That is its
whole purpose.  The algorithms mostly qualify as "embarrasingly
parallel".  The code is part of the same application so the page
mappings should be identical.  If the shared inner loops fail to fit
in the L1 instruction cache or there is aliasing then the performance
would be poor.


Could that application be turned into a test case that I could use to
benchmark and debug this further?

The application is open source and available from"http://www.graphicsmagick.org/";. It is one of the few OpenMP-basedapplications outside of the HPC space, and one of the very fewOpenMP-based applications that one might find in a Linux distribution.A version of it is included in a popular benchmark suite.

I can send you a script with a few input which acts as a benchmark.The application includes its own built in benchmarking facility.

I was hoping to investigate GCC's bdver1 output (which does try to
address L1 instruction cache issues) on Illumos but I discovered that
Illumos is not currently capable of executing this code ("illegal
instruction").


Did you test this with the latest code from illumos-gate? The patches to
support the new instruction sets on Bulldozer just went in a few days
ago.

No. I don't have physical access to the system. I could updatekernel binaries and remotely reboot the system if it is reasonablyeasy to install/modify the kernel. There is little time availablethough since the system will be gone tomorrow.

Could you compile your program with gcc and tuned for barcelona on Linux
and compare the runtime with Illumos on the same hardware?

I did this previously on a 16-core Opteron 6200 system (courtesy ofthe same hardware vendor) and have the results available. Linux andIllumos GCC results were quite similar. Performance with the AMDOpen64 compiler on Linux (the one that AMD benchmarks with) was betterand much more consistent. There was never anything close to a 16Xperformance boost, although the software can achieve linear speedup(12X speedup, or more, with 12 cores) for some algorithms on IntelXeon CPUs.

64-core is pretty different from 16-core since there is a whole lotmore contention going on for the same amount of work.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Re: [discuss] User-space locks slow on Opteron 6200?

Reply via email to