Simple performance question from a newcomer

dextorious via Digitalmars-d-learn Sun, 21 Feb 2016 06:36:13 -0800

I've been vaguely aware of D for many years, but the recentaddition of std.experimental.ndslice finally inspired me to giveit a try, since my main expertise lies in the domain ofscientific computing and I primarily use Python/Julia/C++, wheremultidimensional arrays can be handled with a great deal ofexpressiveness and flexibility. Before writing anything serious,I wanted to get a sense for the kind of code I would have towrite to get the best performance for numerical calculations, soI wrote a trivial summation benchmark. The following code gave meslightly surprising results:


import std.stdio;
import std.array : array;
import std.algorithm;
import std.datetime;
import std.range;
import std.experimental.ndslice;


void main() {
        int N = 1000;
        int Q = 20;
        int times = 1_000;
        double[] res1 = uninitializedArray!(double[])(N);
        double[] res2 = uninitializedArray!(double[])(N);
        double[] res3 = uninitializedArray!(double[])(N);
        auto f = iota(0.0, 1.0, 1.0 / Q / N).sliced(N, Q);
        StopWatch sw;
        double t0, t1, t2;
        sw.start();
        foreach (unused; 0..times) {
                for (int i=0; i<N; ++i) {
                        res1[i] = sumtest1(f[i]);
                }
        }
        sw.stop();
        t0 = sw.peek().msecs;
        sw.reset();
        sw.start();
        foreach (unused; 0..times) {
                for (int i=0; i<N; ++i) {
                        res2[i] = sumtest2(f[i]);
                }
        }
        sw.stop();
        t1 = sw.peek().msecs;
        sw.reset();
        sw.start();
        foreach (unused; 0..times) {
                sumtest3(f, res3, N, Q);
        }
        t2 = sw.peek().msecs;
        writeln(t0, " ms");
        writeln(t1, " ms");
        writeln(t2, " ms");
        assert( res1 == res2 );
        assert( res2 == res3 );
}

auto sumtest1(Range)(Range range) @safe pure nothrow @nogc {
        return range.sum;
}

auto sumtest2(Range)(Range f) @safe pure nothrow @nogc {
        double retval = 0.0;
        foreach (double f_ ; f) {
                retval += f_;
        }
        return retval;
}

auto sumtest3(Range)(Range f, double[] retval, double N, doubleQ) @safe pure nothrow @nogc {

        for (int i=0; i<N; ++i)      {
                for (int j=1; j<Q; ++j)      {
                        retval[i] += f[i,j];
                }
        }
}

When I compiled it using dmd -release -inline -O -noboundscheck../src/main.d, I got the following timings:

1268 ms
312 ms
271 ms

I had heard while reading up on the language that in D explicitloops are generally frowned upon and not necessary for the usualperformance reasons. Nevertheless, the two explicit loopfunctions gave me an improvement by a factor of 4+. Furthermore,the difference between sumtest2 and sumtest3 seems to indicatethat function calls have a significant overhead. I also triedusing f.reduce!((a, b) => a + b) instead of f.sum in sumtest1,but that yielded even worse performance. I did not try theGDC/LDC compilers yet, since they don't seem to be up to date onthe standard library and don't include the ndslice package last Ichecked.

Now, seeing as how my experience writing D is literally a fewhours, is there anything I did blatantly wrong? Did I miss anyoptimizations? Most importantly, can the elegant operatorchaining style be generally made as fast as the explicit loopswe've all been writing for decades?

Simple performance question from a newcomer

Reply via email to