One thing that I would argue for with any new linear algebra library is some
sort of 'out' parameter, like
[numpy](https://docs.scipy.org/doc/numpy/reference/ufuncs.html#optional-keyword-arguments)
has for its operations that produce a vector. In an inner loop of a project
using linalg, I had an operation like:
# simplified
const word2vec_size = 300
for i in 0 .. <NUM_STEPS:
var oldVector: Vector64[word2vec_size] += someOtherVector
when I changed this to:
const word2vec_size = 300
for i in 0 .. <NUM_STEPS:
var oldVector: Vector64[word2vec_size] += learning_rate * someOtherVector
I had an immediate slowdown - which (duh, my fault) was because in other to
scale 'someOtherVector' and add it to the original, a temporary vector had to
be created that held the results of scaling it by the learning rate. The
easiest way of working around was to do some calls into nimblas that gave a
pre-allocated vector to store the result of vector-scalar multiplication.
Having two multiply procs (with/without 'out') or one with an optional 'out'
parameter like:
import nimblas
proc multiply*(scale_val: float, in: VectorType; out: VectorType = nil):
VectorType =
if out == nil:
out = newVectorTypeLike(b)
copy(len(in), in.fp, 1, out.fp, 1)
scal(len(in), scale_val, out.fp, 1)
# VectorType has to be reference type so that this doesn't result in a
copy
result = out
would really help users avoid allocations.