> Agreed that a range should never be converted into a vector if it's not 
> strictly necessary, and I can't understand why it should be necessary in this 
> case. Our dense array indexing code does not do this anywhere, I think, and 
> as 
> you point out it's even more important not to do this for sparse arrays.

(For some reason, I have done this convert-range-to-array-for-indexing
in my own code.  I thought, I'd picked it up from Base but apparently
not.  Just my bad habit then!  oops...)

> Care to turn this into a pull request? A 100x speedup is not too shabby!

Will do.  Although, I'll want to have a better look at it first, before
making that pull-request.

> --Tim
>
> On Tuesday, March 18, 2014 09:53:24 AM Mauro wrote:
>> Yes, this is bad.  I think this stems from the bad habit to change a
>> range inside the getindex function into a vector.  Whilst this is ok-ish
>> for dense matrices its definitely no good for sparse matrices.  In this
>> case a [1:n] is made and on top of that compared to 1:m!
>> 
>> Redefining the called getindex function in base/sparse/sparsematrix.jl
>> to
>> 
>> function getindex{Tv,Ti}(A::SparseMatrixCSC{Tv,Ti}, I::AbstractVector,
>> J::AbstractVector) m = size(A, 1)
>>     if I == 1:m
>>         return Base.SparseMatrix.getindex_cols(A, J)
>>     end
>> 
>>     if isa(I, Range) || isa(I, Range1); I = [I]; end
>> 
>>     if issorted(I)
>>         return getindex_I_sorted(A, I, J)
>>     else
>>         return getindex_general(A, I, J)
>>     end
>> 
>> end
>> 
>> improves speed for me by almost 100x from 2.5s to 0.029s.  I think for
>> getindex is now proportional to non-zeros.
>> 
>> Note above is just a quick hack.  For instance the issorted call is also
>> unnecessary if it is a range... probably should tidy this up.  Does
>> anyone know of a reference paper/implementation for this?
>> 
>> On Mon, 2014-03-17 at 23:57, [email protected] wrote:
>> > It seems to me that indexing sparse matrices, like
>> > m[i,:] is way too slow, even if one is retrieving columns.
>> > In fact, it seems to take time proportional to the dimension
>> > of the matrix, rather than to the number of non-zeros in the column.
>> > 
>> > For example, consider the following timings:
>> > 
>> > n = 10000;
>> > e = speye(n);
>> > x = 0;
>> > tic();
>> > for i = 1:n; x = x + sum(find(e[:,i])); end
>> > toc()
>> > 
>> > elapsed time: 1.388211586 seconds
>> > 
>> > n = 20000;
>> > e = speye(n);
>> > x = 0;
>> > tic();
>> > for i = 1:n; x = x + sum(find(e[:,i])); end
>> > toc()
>> > 
>> > elapsed time: 6.076672088 seconds
>> > 
>> > n = 40000;
>> > e = speye(n);
>> > x = 0;
>> > tic();
>> > for i = 1:n; x = x + sum(find(e[:,i])); end
>> > toc()
>> > 
>> > elapsed time: 22.653318711 seconds
>> > 
>> > Thanks,
>> > 
>> >   Dan Spielman

-- 

Reply via email to