On Wednesday, 15 June 2016 at 13:13:05 UTC, data pulverizer wrote:
On Wednesday, 15 June 2016 at 12:10:32 UTC, Seb wrote:
As said you can avoid the copy (see below). I also profiled it a bit and it was interesting to see that 50% of the runtime are spent on generating the random matrix. On my machine now both scripts take 1.5s when compiled with

I didn't benchmark the RNG but I did notice it took a lot of time to generate the matrix but for now I am focused on the BLAS side of things.

I am puzzled about how your code works:

Firstly:
I didn't know that you could substitute an array for its first element in D though I am aware that a pointer to an array's first element is equivalent to passing the array in C.

auto matrix_mult(T)(T[] A, T[] B, Slice!(2, T*) a, Slice!(2, T*) b){
        ...
gemm(Order.ColMajor, Transpose.NoTrans, Transpose.NoTrans, M, N, K, 1., A.ptr, K, B.ptr, N, 0, C.ptr, N);
        return C.sliced(M, N);
}


You wrote that too :-)
For more infos see:

https://dlang.org/spec/arrays.html

However that's very dangerous, so use just slices wherever you can.

Secondly:
I am especially puzzled about using the second element to stand in for the slice itself. How does that work? And where can I find more cool tricks like that?

void main()
{
        ...
        auto C = matrix_mult(ta[0], tb[0], ta[1], tb[1]);
        sw.stop();
        writeln("Time taken: \n\t", sw.peek().msecs, " [ms]");
}


Many thanks!


Btw you don't even need to save tuples, the pointer is already saved in the slice ;-) N.b: afaik you need the latest version of mir, because std.experimental.ndslice in 2.071 doesn't expose the `.ptr` (yet).


// Random matrix
auto rmat(T)(ulong nrow, ulong ncol, T min, T max){
        return runif(nrow*ncol, min, max).sliced(nrow, ncol);
}

auto matrix_mult(T)(Slice!(2, T*) a, Slice!(2, T*) b){
        int M = to!int(a.shape[0]);
        int K = to!int(a.shape[1]);
        int N = to!int(b.shape[1]);
        int n_el = to!int(a.elementsCount);
        T[] C = new T[M*N];
gemm(Order.ColMajor, Transpose.NoTrans, Transpose.NoTrans, M, N, K, 1., a.ptr, K, b.ptr, N, 0, C.ptr, N);
        return C.sliced(M, N);
}

void main()
{
        int n = 4000;
        auto A = rmat(n, n, 0., 1.);
        auto B = rmat(n, n, 0., 1. );
        StopWatch sw;
        sw.start();
        auto C = matrix_mult(A, B);
        sw.stop();
        writeln("Time taken: \n\t", sw.peek().msecs, " [ms]");
}

If you really want to get the original T[] back, you could use something like

```
T[] a = slice.ptr[0.. slice.elementsCount];
```

but for most cases `byElement` would be a lot better, because all transformations etc are of course only applied to your view.

And where can I find more cool tricks like that?

Browse the source code and the unittests. Phobos is an amazing resource :)

Reply via email to