On Friday, 23 December 2016 at 22:11:31 UTC, Walter Bright wrote:
On 12/23/2016 10:03 AM, hardreset wrote:

For this D code:

enum SIZE = 100000000;

void foo(int* a, int* b) {
    int* atop = a + 1000;
    ptrdiff_t offset = b - a;
    for (; a < atop; ++a)
        *a &= *(a + offset);
}

The following asm is generated by DMD:

                push    EBX
                mov     EBX,8[ESP]
                sub     EAX,EBX
                push    ESI
                cdq
                and     EDX,3
                add     EAX,EDX
                sar     EAX,2
                lea     ECX,0FA0h[EBX]
                mov     ESI,EAX
                cmp     EBX,ECX
                jae     L2C
L20:            mov     EDX,[ESI*4][EBX]
                and     [EBX],EDX
                add     EBX,4
                cmp     EBX,ECX
                jb      L20
L2C:            pop     ESI
                pop     EBX
                ret     4

The inner loop is 5 instructions, whereas the one you wrote is 7 instructions (I didn't benchmark it). With some more source code manipulation the divide can be eliminated, but that is irrelevant to the inner loop.

I patched up the prolog code and timed it and it came out identical to my asm. I tried the ptrdiff C-like code and that still comes out 20% slower here. I'm compiling with...

rdmd test.d -O -release -inline

Am I missing something? How do I get the asm output?



Reply via email to