On Monday, 4 March 2013 at 15:57:42 UTC, jerro wrote:
matrixMul2() takes 2.6 seconds on my machine and matrixMul()takes 72 seconds (both compiled with gdmd -O -inline -release -noboundscheck -mavx).

Thanks Jerro. You made me realize that help from the experts could be quite useful. I plugged in a call to the BLAS matrix multiply routine, which SciD conveniently binds.

The result? My 2000x2000 matrix multiply went from 98 seconds down to 1.8 seconds. Its just hilariously faster to use 20 years of numerical experts optimized code than to try to write your own.


// screaming fast version - uses BLAS for 50x speedup over naive code.
//
Multipliable!(T) mmult2(T)(ref Multipliable!(T) m1,
                         ref Multipliable!(T) m2,
                         ref Multipliable!(T) m3) {
    m3.array[] = 0;

    assert(m1.cols == m2.rows);

    char ntran = 'N';
    double one = 1.0;
    double zero = 0.0;
    int nrow = cast(int)m1.rows;
    int ncol = cast(int)m1.cols;
    int mcol = cast(int)m2.cols;

    scid.bindings.blas.blas.dgemm_(&ntran, // transa
                                   &ntran, // transb
                                   &nrow,  // m
                                   &mcol,  // n
                                   &ncol,  // k
                                   &one,   // alpha
                                   m1.array.ptr, // A
                                   &nrow,        // lda
                                   m2.array.ptr, // B
                                   &ncol,        // ldb
                                   &zero,        // beta
                                   m3.array.ptr, // C
                                   &nrow,        // ldc
                                   nrow,         // transa_len
                                   ncol);        // transb_len
    return m3;
}

Reply via email to