Trass3r:
> > are you willing and able to show me the asm before it gets assembled?
> > (with gcc you do it with the -S switch). (I also suggest to use only the
> > C standard library, with time() and printf() to produce a smaller asm
> > output: http://codepad.org/12EUo16J ).
You are a person of few words :-) Thank you for the asm.
Apparently the program was not compiled in release mode (or with nobounds. With
DMD it's the same thing, maybe with gdc it's not the same thing). It contains
the calls, but they aren't to the next line, they were for the array bounds:
call _d_assert
call _d_array_bounds
call _d_array_bounds
call _d_assert_msg
call _d_array_bounds
call _d_array_bounds
call _d_array_bounds
call _d_array_bounds
call _d_array_bounds
call _d_array_bounds
call _d_assert_msg
But I think this doesn't fully explain the low performance, I have seen too
many instructions like:
movss DWORD PTR [rsp+32], xmm1
movss DWORD PTR [rsp+16], xmm2
movss DWORD PTR [rsp+48], xmm3
If you want to go on with this exploration, then I suggest you to find a way to
disable bound tests.
Bye,
bearophile