On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote:
pragma(inline) static int toIdx(T)(Matrix!T m, in int i, in int j)
{
    return m.cols * i + j;
}

This is row-major order [1]. BTW: Why don't you make toIdx a member of Matrix? It saves one parameter. You may also define opIndex as

   ref T opIndex(in int r, in int c)

Then the innermost summation becomes more readable:

   m3[i, j] += m1[i, k] * m2[k, j];

How about performing an in-place transposition of m2 before performing the dot product? Then you can then rewrite the innermost loop:

   m3[i, j] += m1[i, k] * m2[j, k]; // note: j and k swapped

This should avoid the costly jumping thru the memory. A good starting point for a performance analysis would be looking over the assember code of the innermost loop.

[1] https://en.wikipedia.org/wiki/Row_major

Reply via email to