Hello again, I am having great fun with optimizing the inner loop in my code for full covariance matrices in variational Bayes GMMs. The right call to BLAS wrappers at critical points makes all the difference.
For building the covariance matrices, one ends up summing a lot of ` Δ * Δ' ` constructs, where Δ is a difference vector of some kind. I get major speed improvement in using `Base.BLAS.syrk!()` which fills the upper triangle of the scatter matrix. Because I need these later to be converted to regular dense arrays, I want to copy the upper triangle in the lower. Easily done, see this gist <https://gist.github.com/davidavdav/e04f7f34b78c22dbe1cc>. I first tried Symmetric(s) which shows the matrix fine in the REPL, but it seems otherwise useless for further processing, all normal matrix manipulation options are not implemented (e.g., you can't add a scalar to the matrix). I've tried all kinds of operations (convert(Matrix{Float64}, ), collect(), float(), etc) to convert this to a dense array, but I couldn't get it to work. Question is: what is the preferred way to manipulate matrices that are the result from `syrk()`, and does ::Symmetric play a role in there? I get these Δs from vectors as rows from arrays (perhaps a bad design choice), and I found that both the direct matrix slice `x[i,:]` as well as `sub(x, i, :)` and view(x, i, :)` perform very slowly when used further in the matrix multiplication construct. A very simple function `rvec!(y, x, i)` (see the gist <https://gist.github.com/davidavdav/e04f7f34b78c22dbe1cc>), effectively equivalent to ` y = x[i,:]' ` on pre-allocated `y` performs many times faster than the getindex, sub and view equivalents. I cannot imagine the rvec!() is so ingenious (it can probably be improved using `blascopy!`), so my question is: what is the Base or ArrayVIews way of selecting a row vector efficiently and producing a scatter matrix from it using `syrk!()`? Cheers, ---david
