At 01:11 AM 4/28/2007 -0500, I wrote:
>At 06:37 AM 4/27/2007 -0500, William A. Rowe wrote:
>>nope - the proposed change is a bit more expensive. (magnitude % 10 in
>>any case being the unavoidably most expensive bit.)
[snip]
> /* Eat two digits at a time.
> while (magnitude > 9) {
> *(short *)--p = two_digit_lut[magnitude % 100];
> --p, magnitude /= 100;
> }
[snip]
Incidentally, I fixed up this loop (it should be "*(short *)(p -= 2)",
rather than splitting the subtraction like that, and the alignment
comparison should be inverted) and ran a little test, and apparently
Microsoft's compiler does not use an IDIV. Instead, it uses a bizarre
multiplication trick to obtain the values of "magnitude % 100" and
"magnitude / 100":
; magnitude in ecx
mov eax, 1374389535
imul ecx
sar edx, 5
mov eax, edx
shr eax, 31
add eax, edx ; eax is now equal to ecx / 100!
mov edx, eax
imul edx, 100
sub ecx, edx ; ecx is now equal to magnitude % 100
; (magnitude - 100 * (magnitude / 100))
Intuitively, I wouldn't expect this long sequence to be faster, but it must
be or they wouldn't emit it. (I guess I'm underestimating the cost of a
division significantly!) They apparently use it whenever they see both
"value % n" and "value / n" near to one another. Also note how in this
case, since it is in a loop where the LCV is overwritten, their register
allocator has no qualms about overwriting the original magnitude value in
ecx, since it already has (magnitude / 100) for the next loop in eax. I'm
not sure how the magic number 1374389535 is computed, but I'm sure it's not
rocket science once you know the trick.
If this is significantly faster than an IDIV or two, then other parts of
the loop become more significant.
Jonathan Gilbert