On 03/04/2013 04:46 PM, jerro wrote:
A bit better version:
http://codepad.org/jhbYxEgU
I think this code is good compared to the original (there are better
algorithms).
You can make it much faster even without really changing the algorithm.
Just by reversing the order of inner two loops like this:
void matrixMult2(in int[][] m1, in int[][] m2, int[][] m3) pure nothrow {
foreach (immutable i; 0 .. m1.length)
foreach (immutable k; 0 .. m2[0].length)
foreach (immutable j; 0 .. m3[0].length)
m3[i][j] += m1[i][k] * m2[k][j];
}
you can make the code much more cache friendly (because now you aren't
iterating any matrix by column in the inner loop) and also allow the
compiler to do auto vectorization. matrixMul2() takes 2.6 seconds on my
machine and matrixMul()takes 72 seconds (both compiled with gdmd -O
-inline -release -noboundscheck -mavx).
This isn't really relevant to the comparison with C++ in this thread, I
just thought it may be useful for anyone writing matrix code.
This is a little faster for 2000x2000 matrices and typical cache size:
void matrixMult(in int[][] m1, in int[][] m2, int[][] m3){
enum block=50; // optimal parameter depends on hardware
foreach(ib;iota(0,m1.length,block))
foreach(kb;iota(0,m2[0].length,block))
foreach(i;iota(ib,min(m1.length,ib+block)))
foreach(k;iota(kb,min(m2[0].length,kb+block)))
m3[i][] += m1[i][k] * m2[k][];
}