monarch_dodra:

Could you maybe also add this in your study?

void reverseArr(T)(T[] arr) {
    for (auto x = arr.ptr, y = &arr[$ - 1]; x < y; )
        swap(*x++, *y--);
}


This is the full program:

import core.stdc.stdio: printf;
import std.algorithm: reverse, swap;

void reverseArr(T)(T[] arr) {
    for (auto x = arr.ptr, y = &arr[$ - 1]; x < y; ) {
        auto aux = *x;
        *x++ = *y;
        *y-- = aux;
    }
}

void reverseArray(T)(T[] arr) {
    for (auto x = arr.ptr, y = &arr[$ - 1]; x < y; )
        swap(*x++, *y--);
}

void main() {
    auto a = [10, 20, 30, 40];
    foreach(x; a) printf("%d ", x); printf("\n");
    a.reverseArr();
    foreach(x; a) printf("%d ", x); printf("\n");
    a.reverse();
    foreach(x; a) printf("%d ", x); printf("\n");
    a.reverseArray();
    foreach(x; a) printf("%d ", x); printf("\n");
}


And this is the asm of the reverseArray:

reverseArray:
        push EBX
        mov ECX, 0Ch[ESP]
        mov EBX, 0Ch[ESP]
        push ESI
        mov ESI, 0Ch[ESP]
        lea ESI, -4[ESI*4][ECX]
        push EDI
        cmp ECX, ESI
        jae L30
L1A:    mov EAX, EBX
        mov EDI, ESI
        mov EDX, [EAX]
        mov ECX, [EDI]
        add EBX, 4
        sub ESI, 4
        mov [EAX], ECX
        cmp EBX, ESI
        mov [EDI], EDX
        jb  L1A
L30:    pop EDI
        pop ESI
        pop EBX
        ret 8


So the inner loop is 10 asm instructions instead of the 8 asm instructions of the loop of reverseArr(), that is 2 more register swaps, that are performed very quickly by the hardware register renaming in the modern CPUs. So this is a small difference, probably acceptable in many situations. While the difference between the performance of reverseArray() and std.algorithm.reverse is significant.


Related, the starting condition, as written: "y = &arr[$ - 1]" will choke on empty ranges.

I'd have suggested: "y = arr.ptr + arr.length - 1", but that will also choke if the arr.ptr is null. I'd say you should just add an empty short-circuit test int there.

Right. But my study was mostly about what's inside the loop of those functions. What's outside the loop is not influencing performance significantly.

Bye,
bearophile

Reply via email to