On Thu, 12 Apr 2012, Hans Rosenfeld wrote:

FYI, there is a white paper about the L1I cache aliasing issue on family
0x15 here:

http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf

I don't know whether this applies to Illumos or this particular problem
at all, but it was the first thing I remembered when I read "slow" and
"Opteron" :)

My OpenMP-based application definitely fits the description of a potentially "problematic application" because it does execute the same code in tight loops in both cores of a compute unit. That is its whole purpose. The algorithms mostly qualify as "embarrasingly parallel". The code is part of the same application so the page mappings should be identical. If the shared inner loops fail to fit in the L1 instruction cache or there is aliasing then the performance would be poor.

I was hoping to investigate GCC's bdver1 output (which does try to address L1 instruction cache issues) on Illumos but I discovered that Illumos is not currently capable of executing this code ("illegal instruction"). Using 'barcelona' instructions with 'bdver1' tuning does produce code which executes, but the performance is 1/3 less than normal 'barcelona' code. Under Linux I discovered that using AMD's Open64 compiler (producing Bulldozer-optimized output) lead to considerably improved use of the top 1/2 of the available cores. I was hoping to discover the same with GCC code.

Remarkably, I was not able to find any discussion/reports of GCC 'bdver1' code performance on the Internet.

Although our default spin count for adaptive mutexes seems to be 10
times what glibc used, using a higher value as described in
http://src.illumos.org/source/xref/illumos-gate/usr/src/lib/libc/port/threads/synch.c#88
could be worth a try.

Alas, I am not in a position to try this at the moment. I have part-time remote use of the system until tomorrow.

My next system will run an Illumos kernel. The purpose of my current investigation is to see if it should be a 64-core Opteron 6200 or a 16-core E5 Xeon based system (costs the same). I am a poor free software developer purchasing expensive hardware out of his own pocket so it is important to make the right decision. The system vendor has been quite helpful to me but it seems that my original question is still unanswered.

Bob
--
Bob Friesenhahn
[email protected], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to