On 8/2/2013 2:47 AM, Rainer Schuetze wrote:
My disassembly looks exactly the same. I don't think that a single div operation in a rather long function has a lot of impact on modern processors. I'm running an i7, according to the instruction tables by Agner Fog, the div has latency of 17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate the latency of the asm snippet, I also get 16 cycles. And that doesn't take the additional tests and jumps into consideration.
I'm using an AMD FX-6100.