On Thu, Apr 12, 2012 at 09:39:23AM -0500, Bob Friesenhahn wrote:
> My OpenMP-based application definitely fits the description of a 
> potentially "problematic application" because it does execute the same 
> code in tight loops in both cores of a compute unit.  That is its 
> whole purpose.  The algorithms mostly qualify as "embarrasingly 
> parallel".  The code is part of the same application so the page 
> mappings should be identical.  If the shared inner loops fail to fit 
> in the L1 instruction cache or there is aliasing then the performance 
> would be poor.

Could that application be turned into a test case that I could use to
benchmark and debug this further?

> I was hoping to investigate GCC's bdver1 output (which does try to 
> address L1 instruction cache issues) on Illumos but I discovered that 
> Illumos is not currently capable of executing this code ("illegal 
> instruction"). 

Did you test this with the latest code from illumos-gate? The patches to
support the new instruction sets on Bulldozer just went in a few days
ago.

> Using 'barcelona' instructions with 'bdver1' tuning 
> does produce code which executes, but the performance is 1/3 less than 
> normal 'barcelona' code.  Under Linux I discovered that using AMD's 
> Open64 compiler (producing Bulldozer-optimized output) lead to 
> considerably improved use of the top 1/2 of the available cores.  I 
> was hoping to discover the same with GCC code.

Could you compile your program with gcc and tuned for barcelona on Linux
and compare the runtime with Illumos on the same hardware?


Hans


-- 
%SYSTEM-F-ANARCHISM, The operating system has been overthrown


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to