Re: Programming language benchmarks

bearophile Thu, 28 Apr 2011 04:52:08 -0700

Moritz Warning:

> I think this might be interesting; it's also using D.


I was about to post this :-)
The little benchmark:
http://attractivechaos.wordpress.com/2011/04/25/my-programming-language-benchmarks-plb/
http://attractivechaos.github.com/plb/

(The matrix mul code is not a realistic example of code because people use 
library code to do this operation.)

Original C code:

double **mm_mul(int n, double *const *a, double *const *b)
{
    int i, j, k;
    double **m, **c;
    m = mm_init(n); c = mm_init(n);
    for (i = 0; i < n; ++i) // transpose
        for (j = 0; j < n; ++j)
            c[i][j] = b[j][i];
    for (i = 0; i < n; ++i) {
        double *p = a[i], *q = m[i];
        for (j = 0; j < n; ++j) {
            double t = 0.0, *r = c[j];
            for (k = 0; k < n; ++k)
                t += p[k] * r[k];
            q[j] = t;
        }
    }
    mm_destroy(n, c);
    return m;
}


The original D1 code of the matrix mul, coming from the Java version:

double[][] matmul(double[][] a, double[][] b) {
    int m = a.length, n = a[0].length, p = b[0].length;
    double[][] x, c;
    x.length = m; c.length = p;
    for (int i = 0; i < m; ++i) x[i].length = p;
    for (int i = 0; i < p; ++i) c[i].length = n;
    for (int i = 0; i < n; ++i) // transpose
        for (int j = 0; j < p; ++j)
            c[j][i] = b[i][j];
    for (int i = 0; i < m; ++i)
        for (int j = 0; j < p; ++j) {
            double s = 0.0;
            for (int k = 0; k < n; ++k)
                s += a[i][k] * c[j][k];
            x[i][j] = s;
        }
    return x;
}



Improved D2 code:
http://codepad.org/aifQYhjI


pure nothrow double[][] matMul(in double[][] a, in double[][] b) {
    immutable int m = a.length,
                  n = a[0].length,
                  p = b[0].length;

    // transpose
    auto c = new double[][](p, n);
    foreach (i, brow; b)
        foreach (j, bx; brow)
            c[j][i] = bx;

    auto x = new double[][](m, p);

    foreach (i, arow; a)
        foreach (j, crow; c) {
            // x[i][j] = std.numeric.dotProduct(arow, crow); // right way
            double s = 0.0;
            foreach (k, arowk; arow)
                s += arowk * crow[k];
            x[i][j] = s;
        }

    return x;
}



Lua code:

function matrix.mul(a, b, n)
        local y = matrix.new(n, n)
        local c = matrix.T(b, n)
        for i = 1, n - 1 do
                for j = 1, n - 1 do
                        local sum = 0
                        for k = 0, n - 1 do sum = sum + a[i][k] * c[j][k] end
                        y[i][j] = sum
                end
        end
        return y
end

--------------------

Timings, seconds:
  ICC-12.0.3               C    1.8
  GCC-4.3.2                C    2.3
  GDC-0.24                 D    8.8 (DMD gives similar timings)
  Java@JRE-1.6.0_14        Java 2.9
  LuaJIT-2.0.0-beta6 (JIT) Lua  2.7

Improved D version: about 4.4 seconds


Inner loop GCC:

L25:
        fldl    (%edx,%eax,8)
        fmull   (%ecx,%eax,8)
        addl    $1, %eax
        cmpl    %ebx, %eax
        faddp   %st, %st(1)
        jne     L25


Inner loop DMD, original version:

L153:   mov     EDX,4[EBX]
                mov     EAX,[EBX]
                mov     EAX,024h[ESP]
                fld     qword ptr [ESI*8][EDX]
                mov     EDX,028h[ESP]
                mov     EAX,[EDI*8][EDX]
                mov     EDX,4[EDI*8][EDX]
                fmul    qword ptr [ESI*8][EDX]
                inc     ESI
                cmp     ESI,EBP
                fadd    qword ptr 034h[ESP]
                fstp    qword ptr 034h[ESP]
                jl      L153


Inner loop DMD, improved:

L172:   fld     qword ptr [EBX*8][ECX]
                fmul    qword ptr [EBX*8][ESI]
                inc     EBX
                cmp     EBX,04Ch[ESP]
                fadd    qword ptr 054h[ESP]
                fstp    qword ptr 054h[ESP]
                jb      L172


The asm produced by DMD perfoms too many memory accesses in the inner loop. Is 
it possible to improve the DMD backend to remove this problem?

Bye,
bearophile

Re: Programming language benchmarks

Reply via email to