On Wed, Oct 26, 2011 at 6:15 PM, Rodrigo Kumpera <[email protected]> wrote:
> The CLR demands that all floating point calculations to be conducted with > double precision. Where in the spec does it state this? Everything I've read allows for both 32-bit and 64-bit representations, with the only restriction being that arithmetic must be performed with enough precision for the expression type. And if 64-bit arithmetic is required, why is 32-bit used in the 32-bit Mono VM, and in the .NET VM? > > > > On Wed, Oct 26, 2011 at 4:00 PM, Justin Holewinski < > [email protected]> wrote: > >> I'm currently testing Mono on some single-precision FP-heavy workloads, >> and I'm a bit surprised to see that the performance of the 64-bit VM is >> significantly slower than the 32-bit VM, over 2x in many cases. As an >> example, on Mac OS X 10.7, with Mono 2.10.6 compiled in both 32-bit and >> 64-bit modes: >> >> Matrix Multiply Micro-Benchmark: >> >> jholewinski@aquila [tests]$ ~/projects/mono/install/x86/bin/mono -O=all >> embed1-extract.exe >> Scalar: 362.775 ms >> Mono.Simd: 164.645 ms >> jholewinski@aquila [tests]$ ~/projects/mono/install/x64/bin/mono -O=all >> embed1-extract.exe >> Scalar: 841.482 ms >> Mono.Simd: 131.844 ms >> >> The Mono.Simd case is good, but for the scalar code that is a >> large discrepancy. Further, if I look at the disasembly from Mono, it looks >> like the 64-bit VM is using double-precision arithmetic for single-precision >> data types with the non-Mono.Simd version: >> >> 000000000000001b movss 0x00(%r13),%xmm0 >> 0000000000000021 cvtss2sd %xmm0,%xmm0 >> 0000000000000025 movss (%r14),%xmm1 >> 000000000000002a cvtss2sd %xmm1,%xmm1 >> 000000000000002e mulsd %xmm1,%xmm0 >> 0000000000000032 movss 0x04(%r13),%xmm1 >> 0000000000000038 cvtss2sd %xmm1,%xmm1 >> 000000000000003c movss 0x10(%r14),%xmm2 >> 0000000000000042 cvtss2sd %xmm2,%xmm2 >> 0000000000000046 mulsd %xmm2,%xmm1 >> 000000000000004a addsd %xmm1,%xmm0 >> 000000000000004e movss 0x08(%r13),%xmm1 >> 0000000000000054 cvtss2sd %xmm1,%xmm1 >> 0000000000000058 movss 0x20(%r14),%xmm2 >> 000000000000005e cvtss2sd %xmm2,%xmm2 >> 0000000000000062 mulsd %xmm2,%xmm1 >> 0000000000000066 addsd %xmm1,%xmm0 >> 000000000000006a movss 0x0c(%r13),%xmm1 >> >> This could definitely account for the performance discrepancy. Why is >> Mono up-converting to doubles for single-precision expressions? >> >> The 32-bit VM appears to be using the x87 stack instead of SSE scalar >> instructions, but at least its using single-precision. >> >> -- >> >> Thanks, >> >> Justin Holewinski >> >> >> _______________________________________________ >> Mono-list maillist - [email protected] >> http://lists.ximian.com/mailman/listinfo/mono-list >> >> > -- Thanks, Justin Holewinski
_______________________________________________ Mono-list maillist - [email protected] http://lists.ximian.com/mailman/listinfo/mono-list
