Agreed that a range should never be converted into a vector if it's not
strictly necessary, and I can't understand why it should be necessary in this
case. Our dense array indexing code does not do this anywhere, I think, and as
you point out it's even more important not to do this for sparse arrays.
Care to turn this into a pull request? A 100x speedup is not too shabby!
--Tim
On Tuesday, March 18, 2014 09:53:24 AM Mauro wrote:
> Yes, this is bad. I think this stems from the bad habit to change a
> range inside the getindex function into a vector. Whilst this is ok-ish
> for dense matrices its definitely no good for sparse matrices. In this
> case a [1:n] is made and on top of that compared to 1:m!
>
> Redefining the called getindex function in base/sparse/sparsematrix.jl
> to
>
> function getindex{Tv,Ti}(A::SparseMatrixCSC{Tv,Ti}, I::AbstractVector,
> J::AbstractVector) m = size(A, 1)
> if I == 1:m
> return Base.SparseMatrix.getindex_cols(A, J)
> end
>
> if isa(I, Range) || isa(I, Range1); I = [I]; end
>
> if issorted(I)
> return getindex_I_sorted(A, I, J)
> else
> return getindex_general(A, I, J)
> end
>
> end
>
> improves speed for me by almost 100x from 2.5s to 0.029s. I think for
> getindex is now proportional to non-zeros.
>
> Note above is just a quick hack. For instance the issorted call is also
> unnecessary if it is a range... probably should tidy this up. Does
> anyone know of a reference paper/implementation for this?
>
> On Mon, 2014-03-17 at 23:57, [email protected] wrote:
> > It seems to me that indexing sparse matrices, like
> > m[i,:] is way too slow, even if one is retrieving columns.
> > In fact, it seems to take time proportional to the dimension
> > of the matrix, rather than to the number of non-zeros in the column.
> >
> > For example, consider the following timings:
> >
> > n = 10000;
> > e = speye(n);
> > x = 0;
> > tic();
> > for i = 1:n; x = x + sum(find(e[:,i])); end
> > toc()
> >
> > elapsed time: 1.388211586 seconds
> >
> > n = 20000;
> > e = speye(n);
> > x = 0;
> > tic();
> > for i = 1:n; x = x + sum(find(e[:,i])); end
> > toc()
> >
> > elapsed time: 6.076672088 seconds
> >
> > n = 40000;
> > e = speye(n);
> > x = 0;
> > tic();
> > for i = 1:n; x = x + sum(find(e[:,i])); end
> > toc()
> >
> > elapsed time: 22.653318711 seconds
> >
> > Thanks,
> >
> > Dan Spielman