On 02.08.2013 10:24, Walter Bright wrote:
On 8/2/2013 12:57 AM, Rainer Schuetze wrote:

Although my laptop got quite a bit faster overnight (I guess it was
for some reason yesterday), relative results don't change:

std.algorithm -main -unittest

dmc85?: 12.5 sec
dmc857: 12.5 sec
msc: 7 sec

BTW: I usually use VS2008, but now also tried VS2010 - no difference.

The two dmc times shouldn't be the same. I see a definite improvement.
Disassemble aav.obj, and look at the function aaGetRvalue. It should
look like this:

My disassembly looks exactly the same. I don't think that a single div operation in a rather long function has a lot of impact on modern processors. I'm running an i7, according to the instruction tables by Agner Fog, the div has latency of 17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate the latency of the asm snippet, I also get 16 cycles. And that doesn't take the additional tests and jumps into consideration.

======== note this section does not have a div instruction in it ==============
                mov     EAX,EBX
                mov     EDX,08421085h   ; latency 3
                mov     ECX,EBX
                mul     EDX             ; latency 5
                mov     EAX,ECX
                sub     EAX,EDX         ; latency 1
                shr     EAX,1           ; latency 1
                lea     EDX,[EAX][EDX]  ; latency 1
                shr     EDX,4           ; latency 1
                imul    EAX,EDX,01Fh    ; latency 3
                sub     ECX,EAX         ; latency 1
                mov     ESI,ECX

Reply via email to