On 02.08.2013 10:24, Walter Bright wrote:
On 8/2/2013 12:57 AM, Rainer Schuetze wrote:
http://www.digitalmars.com/download/freecompiler.html
Although my laptop got quite a bit faster overnight (I guess it was
throttled
for some reason yesterday), relative results don't change:
std.algorithm -main -unittest
dmc85?: 12.5 sec
dmc857: 12.5 sec
msc: 7 sec
BTW: I usually use VS2008, but now also tried VS2010 - no difference.
The two dmc times shouldn't be the same. I see a definite improvement.
Disassemble aav.obj, and look at the function aaGetRvalue. It should
look like this:
My disassembly looks exactly the same. I don't think that a single div
operation in a rather long function has a lot of impact on modern
processors. I'm running an i7, according to the instruction tables by
Agner Fog, the div has latency of 17-28 cycles and a reciprocal
throughput of 7-17 cycles. If I estimate the latency of the asm snippet,
I also get 16 cycles. And that doesn't take the additional tests and
jumps into consideration.
======== note this section does not have a div instruction in it
==============
mov EAX,EBX
mov EDX,08421085h ; latency 3
mov ECX,EBX
mul EDX ; latency 5
mov EAX,ECX
sub EAX,EDX ; latency 1
shr EAX,1 ; latency 1
lea EDX,[EAX][EDX] ; latency 1
shr EDX,4 ; latency 1
imul EAX,EDX,01Fh ; latency 3
sub ECX,EAX ; latency 1
mov ESI,ECX
==========================================================================