i did the exact same thing, and dumped the assembly generated by c and c# on nbody, in addition to the sqrt issue, which i believe is pre and post a SSEn, where mono isn't as uptodate on full use of SSEn , the other issue was, mono did a few sets of unnecessary register transfer, as compared to the assembly generated by C. With these two issues resolved, the benchmark would have match on time. I am not even sure mono's older sqrt call was the majority of the diff in the mark, I believe it was the unnecessary reg. trans.
you will also notice in that bench mark game, many of the C versions (most recent version of given benchmark), often are not even solutions that can even be compared, I think there is even one that is threaded solution in the winner, and mono uses single thread. Really, that "game" is way more to do with people making better and better algos for a given language solution, then it is a comparison of language. Having said that, it seems to me, given my comparison of the assembly code of a few, that aside from obvious issues of array boundary checks and so on for safety, the main issue of performance kill (for mono) appears to be non optimal use of registers, with to much unnecessary transfer/setup. This however is only most noticeable in these huge loops, with for many people using mono, isn't an issue. Another issue i noticed is that in the latest SSE4.? there is 16 registers to use, but I see Mono shuffling within 8 (i think, if i remember correctly). Oddly enough I didn't see gnu gcc using the available 16 either. tl On Mon, 4 Oct 2010 10:43:18 -0400 Jonathan Shore <[email protected]> wrote: > Hi, > > I am looking forward to moving all of my code from Java / C++ to F# / C# in > the very near future. I took the nbody code from the language shootout and > ran with 500 million iterations (much more than used in the shootout to > provide a fair comparison) on ubuntu server on a core i7 920 box. > > I used: > > - C++ (g++ -O3 with various MMX related flags as done in the shootout) > - Java 7 -server > - Mono 2.4.4, compiling with -optimize:+ > > I had the following results in seconds: > > 1. C++: 98 seconds > 2. JVM: 126 seconds, a 28% performance gap against C++ > 3. Mono: 191 seconds, a 50% performance gap with the JVM > > Because the nbody problem uses sqrt for the euclidean distance in each loop, > thought that maybe the discrepancy might be more related to the > implementation of Sqrt(). > > I implemented a (very poor) numerical algorithm as a substitute for the > sqrt() function in each implementation to provide an apples-to-apples > comparison. The new numbers became: > > 1. C++: 517 seconds > 2. JVM: 527 seconds > 3 Mono: 223 seconds (wow, a surprise here) > > I noticed that the Mono runtime libraries use an internal implementation of > Sqrt() that seems to resolve to an Op Code. I am wondering, ultimately, > what implementation this maps to? Clearly the Sqrt implementation in Mono > is 2x as slow (or access through the layers is 2x as slow) as the libc > implementation. > > I do mostly numerical work, so concerned about sqrt as well as other > fundamental functions in this regard. Are these custom implementations in > assembler for each arch? Would it be reasonable to try to map these to the > existing libc library when available? > > Thanks > > -- > Jonathan Shore > Systematic Trading Group > > _______________________________________________ > Mono-list maillist - [email protected] > http://lists.ximian.com/mailman/listinfo/mono-list > -- ted leslie <[email protected]> _______________________________________________ Mono-list maillist - [email protected] http://lists.ximian.com/mailman/listinfo/mono-list
