On Wednesday, 15 June 2016 at 12:10:32 UTC, Seb wrote:
As said you can avoid the copy (see below). I also profiled it a bit and it was interesting to see that 50% of the runtime are spent on generating the random matrix. On my machine now both scripts take 1.5s when compiled with

I didn't benchmark the RNG but I did notice it took a lot of time to generate the matrix but for now I am focused on the BLAS side of things.

I am puzzled about how your code works:

Firstly:
I didn't know that you could substitute an array for its first element in D though I am aware that a pointer to an array's first element is equivalent to passing the array in C.

auto matrix_mult(T)(T[] A, T[] B, Slice!(2, T*) a, Slice!(2, T*) b){
        ...
gemm(Order.ColMajor, Transpose.NoTrans, Transpose.NoTrans, M, N, K, 1., A.ptr, K, B.ptr, N, 0, C.ptr, N);
        return C.sliced(M, N);
}


Secondly:
I am especially puzzled about using the second element to stand in for the slice itself. How does that work? And where can I find more cool tricks like that?

void main()
{
        ...
        auto C = matrix_mult(ta[0], tb[0], ta[1], tb[1]);
        sw.stop();
        writeln("Time taken: \n\t", sw.peek().msecs, " [ms]");
}


Many thanks!

Reply via email to