Yes, there have been some hints of regressions in constructing SubArrays (see 
https://github.com/JuliaLang/julia/issues/11787), but I haven't had the time 
to do a bisect or anything.

--Tim

On Sunday, June 21, 2015 02:07:29 PM Douglas Bates wrote:
> After profiling some code that I will discuss in a lightning talk at
> JuliaCon I found a bottleneck in an unexpected place.  (I guess I should
> know by now that profiling always shows that the bottleneck is in an
> unexpected place - that's why we do it.)
> 
> I have a matrix of matrices, some of which are dense and some sparse, for
> which I wish to create the lower Cholesky factor.  This operation is done
> once for each evaluation of the objective function in a nonlinear
> optimization and I want to be careful about creating copies.  The chol!
> fallback methods (argument is AbstractMatrix) in base/linalg/cholesky.jl
> has operations like
> 
>                 A[k,k] -= A[k,i]*A[k,i]'
> 
> and
> 
>                         A[i,k] -= A[i,j]*A[k,j]'*AkkInv'
> 
> which, when the elements of A are matrices, I rewrote as
> downdate!(A[k,k],A[k,i]) etc.
> 
> One of these methods is
> 
> function
> downdate!{T<:FloatingPoint}(C::DenseMatrix{T},A::DenseMatrix{T},B::SparseMat
> rixCSC{T}) ma,na = size(A)
>     mb,nb = size(B)
>     ma == size(C,1) && mb == size(C,2) && na == nb ||
> throw(DimensionMismatch(""))
>     rvb = rowvals(B); nzb = nonzeros(B)
>     for j in 1:nb
>         aj = sub(A,:,j)
>         ib = nzrange(B,j)
>         rvbj = sub(rvb,ib)
>         nzbj = sub(nzb,ib)
>         for k in eachindex(ib)
>             BLAS.axpy!(-nzbj[k],aj,sub(C,:,Int(rvbj[k])))
>         end
>     end
>     C
> end
> 
> The surprising bottleneck is the call to sub(C,:,Int(rvbj[k])) within the
> call to BLAS.axpy!.  Fortunately there are sufficiently flexible BLAS.axpy!
> method signatures that I think I can write around this using some pointer
> arithmetic and pointer_to_array.  At present the sub operation seems to be
> taking about 2/3 of the time in the BLAS.axpy! call.  That is, the linear
> algebra is much faster than trying to determine which piece to work on.

Reply via email to