One thing that I would argue for with any new linear algebra library is some 
sort of 'out' parameter, like 
[numpy](https://docs.scipy.org/doc/numpy/reference/ufuncs.html#optional-keyword-arguments)
 has for its operations that produce a vector. In an inner loop of a project 
using linalg, I had an operation like:
    
    
    # simplified
    const word2vec_size = 300
    for i in 0 .. <NUM_STEPS:
      var oldVector: Vector64[word2vec_size] += someOtherVector
    

when I changed this to: 
    
    
    const word2vec_size = 300
    for i in 0 .. <NUM_STEPS:
      var oldVector: Vector64[word2vec_size] += learning_rate * someOtherVector
    

I had an immediate slowdown - which (duh, my fault) was because in other to 
scale 'someOtherVector' and add it to the original, a temporary vector had to 
be created that held the results of scaling it by the learning rate. The 
easiest way of working around was to do some calls into nimblas that gave a 
pre-allocated vector to store the result of vector-scalar multiplication. 
Having two multiply procs (with/without 'out') or one with an optional 'out' 
parameter like:
    
    
    import nimblas
    
    proc multiply*(scale_val: float, in: VectorType; out: VectorType = nil): 
VectorType =
      if out == nil:
         out = newVectorTypeLike(b)
      
      copy(len(in), in.fp, 1, out.fp, 1)
      scal(len(in), scale_val, out.fp, 1)
      
      # VectorType has to be reference type so that this doesn't result in a 
copy
      result = out
    

would really help users avoid allocations. 

Reply via email to