I've been vaguely aware of D for many years, but the recent
addition of std.experimental.ndslice finally inspired me to give
it a try, since my main expertise lies in the domain of
scientific computing and I primarily use Python/Julia/C++, where
multidimensional arrays can be handled with a great deal of
expressiveness and flexibility. Before writing anything serious,
I wanted to get a sense for the kind of code I would have to
write to get the best performance for numerical calculations, so
I wrote a trivial summation benchmark. The following code gave me
slightly surprising results:
import std.stdio;
import std.array : array;
import std.algorithm;
import std.datetime;
import std.range;
import std.experimental.ndslice;
void main() {
int N = 1000;
int Q = 20;
int times = 1_000;
double[] res1 = uninitializedArray!(double[])(N);
double[] res2 = uninitializedArray!(double[])(N);
double[] res3 = uninitializedArray!(double[])(N);
auto f = iota(0.0, 1.0, 1.0 / Q / N).sliced(N, Q);
StopWatch sw;
double t0, t1, t2;
sw.start();
foreach (unused; 0..times) {
for (int i=0; i<N; ++i) {
res1[i] = sumtest1(f[i]);
}
}
sw.stop();
t0 = sw.peek().msecs;
sw.reset();
sw.start();
foreach (unused; 0..times) {
for (int i=0; i<N; ++i) {
res2[i] = sumtest2(f[i]);
}
}
sw.stop();
t1 = sw.peek().msecs;
sw.reset();
sw.start();
foreach (unused; 0..times) {
sumtest3(f, res3, N, Q);
}
t2 = sw.peek().msecs;
writeln(t0, " ms");
writeln(t1, " ms");
writeln(t2, " ms");
assert( res1 == res2 );
assert( res2 == res3 );
}
auto sumtest1(Range)(Range range) @safe pure nothrow @nogc {
return range.sum;
}
auto sumtest2(Range)(Range f) @safe pure nothrow @nogc {
double retval = 0.0;
foreach (double f_ ; f) {
retval += f_;
}
return retval;
}
auto sumtest3(Range)(Range f, double[] retval, double N, double
Q) @safe pure nothrow @nogc {
for (int i=0; i<N; ++i) {
for (int j=1; j<Q; ++j) {
retval[i] += f[i,j];
}
}
}
When I compiled it using dmd -release -inline -O -noboundscheck
../src/main.d, I got the following timings:
1268 ms
312 ms
271 ms
I had heard while reading up on the language that in D explicit
loops are generally frowned upon and not necessary for the usual
performance reasons. Nevertheless, the two explicit loop
functions gave me an improvement by a factor of 4+. Furthermore,
the difference between sumtest2 and sumtest3 seems to indicate
that function calls have a significant overhead. I also tried
using f.reduce!((a, b) => a + b) instead of f.sum in sumtest1,
but that yielded even worse performance. I did not try the
GDC/LDC compilers yet, since they don't seem to be up to date on
the standard library and don't include the ndslice package last I
checked.
Now, seeing as how my experience writing D is literally a few
hours, is there anything I did blatantly wrong? Did I miss any
optimizations? Most importantly, can the elegant operator
chaining style be generally made as fast as the explicit loops
we've all been writing for decades?