On Wednesday, 15 June 2016 at 12:10:32 UTC, Seb wrote:
As said you can avoid the copy (see below). I also profiled it
a bit and it was interesting to see that 50% of the runtime are
spent on generating the random matrix. On my machine now both
scripts take 1.5s when compiled with
I didn't benchmark the RNG but I did notice it took a lot of time
to generate the matrix but for now I am focused on the BLAS side
of things.
I am puzzled about how your code works:
Firstly:
I didn't know that you could substitute an array for its first
element in D though I am aware that a pointer to an array's first
element is equivalent to passing the array in C.
auto matrix_mult(T)(T[] A, T[] B, Slice!(2, T*) a, Slice!(2,
T*) b){
...
gemm(Order.ColMajor, Transpose.NoTrans, Transpose.NoTrans,
M, N, K, 1., A.ptr, K, B.ptr, N, 0, C.ptr, N);
return C.sliced(M, N);
}
Secondly:
I am especially puzzled about using the second element to stand
in for the slice itself. How does that work? And where can I find
more cool tricks like that?
void main()
{
...
auto C = matrix_mult(ta[0], tb[0], ta[1], tb[1]);
sw.stop();
writeln("Time taken: \n\t", sw.peek().msecs, " [ms]");
}
Many thanks!