i did the exact same thing, and dumped the assembly generated by c and c# on 
nbody,
in addition to the sqrt issue, which i believe is pre and post a SSEn, where 
mono isn't as uptodate
on full use of SSEn , the other issue was, mono did a few sets of unnecessary 
register transfer,
as compared to the assembly generated by C. With these two issues resolved, the 
benchmark would have match on time. I am not even sure mono's older sqrt call 
was the majority of the diff
in the mark, I believe it was the unnecessary reg. trans.

you will also notice in that bench mark game, many of the C versions (most 
recent version of given benchmark),
often are not even solutions that can even be compared, I think there is even 
one that is threaded solution
in the winner, and mono uses single thread. Really, that "game" is way more to 
do with people making better
and better algos for a given language solution, then it is a comparison of 
language. Having said that,
it seems to me, given my comparison of the assembly code of a few, that aside 
from obvious issues of
array boundary checks and so on for safety, the main issue of performance kill 
(for mono) appears to be
non optimal use of registers, with to much unnecessary transfer/setup. This 
however is only most noticeable
in these huge loops, with for many people using  mono, isn't an issue. Another 
issue i noticed is that
in the latest SSE4.? there is 16 registers to use, but I see Mono shuffling 
within 8 (i think, if i remember
correctly). Oddly enough I didn't see gnu gcc using the available 16 either.

tl

On Mon, 4 Oct 2010 10:43:18 -0400
Jonathan Shore <[email protected]> wrote:

> Hi,
> 
> I am looking forward to moving all of my code from Java / C++ to F# / C# in 
> the very near future.   I took the nbody code from the language shootout and 
> ran with 500 million iterations (much more than used in the shootout to 
> provide a fair comparison) on ubuntu server on a core i7 920 box.
> 
> I used:
> 
> - C++ (g++ -O3 with various MMX related flags as done in the shootout)
> - Java 7  -server
> - Mono 2.4.4, compiling with -optimize:+
> 
> I had the following results in seconds:
> 
> 1.  C++:              98 seconds
> 2.  JVM:              126 seconds,  a 28% performance gap against C++
> 3.  Mono:     191 seconds,  a 50% performance gap with the JVM
> 
> Because the nbody problem uses sqrt for the euclidean distance in each loop, 
> thought that maybe the discrepancy might be more related to the 
> implementation of Sqrt().
> 
> I implemented a (very poor) numerical algorithm as a substitute for the 
> sqrt() function in each implementation to provide an apples-to-apples 
> comparison.    The new numbers became:
> 
> 1.  C++:              517 seconds
> 2.  JVM:              527 seconds
> 3  Mono:              223 seconds (wow, a surprise here)
> 
> I noticed that the Mono runtime libraries use an internal implementation of 
> Sqrt() that seems to resolve to an Op Code.   I am wondering, ultimately, 
> what implementation this maps to?   Clearly the Sqrt implementation in Mono 
> is 2x as slow (or access through the layers is 2x as slow) as the libc 
> implementation.   
> 
> I do mostly numerical work, so concerned about sqrt as well as other 
> fundamental functions in this regard.   Are these custom implementations in 
> assembler for each arch?    Would it be reasonable to try to map these to the 
> existing libc library when available?
> 
> Thanks
> 
> --
> Jonathan Shore
> Systematic Trading Group
> 
> _______________________________________________
> Mono-list maillist  -  [email protected]
> http://lists.ximian.com/mailman/listinfo/mono-list
> 


-- 
ted leslie <[email protected]>
_______________________________________________
Mono-list maillist  -  [email protected]
http://lists.ximian.com/mailman/listinfo/mono-list

Reply via email to