monarch_dodra:
Could you maybe also add this in your study?
void reverseArr(T)(T[] arr) {
for (auto x = arr.ptr, y = &arr[$ - 1]; x < y; )
swap(*x++, *y--);
}
This is the full program:
import core.stdc.stdio: printf;
import std.algorithm: reverse, swap;
void reverseArr(T)(T[] arr) {
for (auto x = arr.ptr, y = &arr[$ - 1]; x < y; ) {
auto aux = *x;
*x++ = *y;
*y-- = aux;
}
}
void reverseArray(T)(T[] arr) {
for (auto x = arr.ptr, y = &arr[$ - 1]; x < y; )
swap(*x++, *y--);
}
void main() {
auto a = [10, 20, 30, 40];
foreach(x; a) printf("%d ", x); printf("\n");
a.reverseArr();
foreach(x; a) printf("%d ", x); printf("\n");
a.reverse();
foreach(x; a) printf("%d ", x); printf("\n");
a.reverseArray();
foreach(x; a) printf("%d ", x); printf("\n");
}
And this is the asm of the reverseArray:
reverseArray:
push EBX
mov ECX, 0Ch[ESP]
mov EBX, 0Ch[ESP]
push ESI
mov ESI, 0Ch[ESP]
lea ESI, -4[ESI*4][ECX]
push EDI
cmp ECX, ESI
jae L30
L1A: mov EAX, EBX
mov EDI, ESI
mov EDX, [EAX]
mov ECX, [EDI]
add EBX, 4
sub ESI, 4
mov [EAX], ECX
cmp EBX, ESI
mov [EDI], EDX
jb L1A
L30: pop EDI
pop ESI
pop EBX
ret 8
So the inner loop is 10 asm instructions instead of the 8 asm
instructions of the loop of reverseArr(), that is 2 more register
swaps, that are performed very quickly by the hardware register
renaming in the modern CPUs. So this is a small difference,
probably acceptable in many situations. While the difference
between the performance of reverseArray() and
std.algorithm.reverse is significant.
Related, the starting condition, as written: "y = &arr[$ - 1]"
will choke on empty ranges.
I'd have suggested: "y = arr.ptr + arr.length - 1", but that
will also choke if the arr.ptr is null. I'd say you should just
add an empty short-circuit test int there.
Right. But my study was mostly about what's inside the loop of
those functions. What's outside the loop is not influencing
performance significantly.
Bye,
bearophile