Re: [julia-users] efficiency of sparse array creation

Viral Shah Wed, 30 Apr 2014 08:41:54 -0700

Octave 3.6 just gave up:

octave:1> tic; sprand(700000, 700000, .001); toc;
error: memory exhausted or requested size too large for range of Octave's index 
type -- trying to return to prompt



-viral



On 30-Apr-2014, at 9:08 pm, Viral Shah <[email protected]> wrote:

> You can call SparseMatrixCSC directly, but then you have to do all the 
> arrangement and sorting yourself. Depending on your application and how the 
> nonzeros are generated, this may or may not help.
> 
> I will investigate this further. I now have all the information I need.
> 
> Thanks,
> 
> -viral
> 
> 
> 
> On 30-Apr-2014, at 8:48 pm, Ryan Gardner <[email protected]> wrote:
> 
>> I've got 16GB of RAM on this machine.  Largely, my question, with admittedly 
>> little knowledge of the internal structure of the sparse arrays, is why 
>> generating the actual SparseMatrixCSC is so much slower than generating what 
>> is essentially another sparse matrix representation consisting of the 
>> indices and values.  (I realize that once we start swapping, which will 
>> happen in my example, things slow down a ton, but even the sprand I mention 
>> was slow.)  Do you observe the same results?  Is the reason for the 
>> difference clear to someone else?
>> 
>> Thanks for all the comments.  These are helpful.  It had not crossed my mind 
>> that I could control the data type of the indices.
>> 
>> Using the SparseMatrixCSC constructor directly would probably be very 
>> helpful.  Do you learn about that constructor from looking at source code or 
>> do you see it somewhere else?
>> 
>> I'm also curious about where @inbounds was used.
>> 
>> 
>> 
>> 
>> 
>> 
>> On Wed, Apr 30, 2014 at 8:59 AM, Tony Kelman <[email protected]> wrote:
>> If you're assembling the matrix in row-sorted column-major order and there's 
>> no duplication, then you can also skip the conversion work by using the 
>> SparseMatrixCSC constructor directly.
>> 
>> 
>> On Wednesday, April 30, 2014 1:10:31 AM UTC-7, Viral Shah wrote:
>> Could you post your code? Will avoid me writing the same. :-) 
>> 
>> Was building the vectors taking all the time, or was it in building the 
>> sparse matrix from the triples? Triples to CSC conversion is an expensive 
>> operation, and we have spent a fair amount of time making it fast. Of 
>> course, there could be more opportunities at speeding the code. 
>> 
>> Where did you use @inbounds and @simd? 
>> 
>> -viral 
>> 
>> 
>> 
>> On 30-Apr-2014, at 1:11 pm, Dominique Orban <[email protected]> wrote: 
>> 
>>> Downgrading the 700,000 to 70,000 for the sake of not waiting all night, 
>>> the original implementation takes about 4.3 seconds on my laptop. 
>>> Preallocating arrays and using @inbounds brings it down to about 0.6 
>>> seconds. @simd doesn't seem to provide any further speedup. Building the 
>>> sparse matrix takes about 3.8 seconds. This may be due to conversion from 
>>> triple to csc format?! 
>>> 
>>> ps: using the original size of 700,000, Julia reports a memory usage of 
>>> 11.8GB. 
>>> 
>>> 
>>> On Wednesday, April 30, 2014 12:26:02 AM UTC-7, Viral Shah wrote: 
>>> I believe the memory requirement should be 700000*700*16 (64-bit nonzeros 
>>> and row indices) + 700001*8 (64-bit column pointers) = 7.8 GB. 
>>> 
>>> This can be brought down a bit by using 32-bit index values and 64-bit 
>>> floats, but then you need 5.8 GB. Finally, if you use 32-bit index values 
>>> with 32-bit floats, you can come down to 4GB. The Julia sparse matrix 
>>> implementation is quite flexible and allows you to easily do such things. 
>>> 
>>> 
>>> julia> s = sparse(int32(1:10), int32(1:10), 1.0); 
>>> 
>>> julia> typeof(s) 
>>> SparseMatrixCSC{Float64,Int32} (constructor with 1 method) 
>>> 
>>> julia> s = sparse(int32(1:10), int32(1:10), float32(1.0)); 
>>> 
>>> julia> typeof(s) 
>>> SparseMatrixCSC{Float32,Int32} (constructor with 1 method) 
>>> 
>>> 
>>> -viral 
>>> 
>>> On Wednesday, April 30, 2014 12:36:17 PM UTC+5:30, Ivar Nesje wrote: 
>>> Sorry for pointing out a probably obvious problem, but as there are others 
>>> that might try debug this issue on their laptop, I ask how much memory do 
>>> you have? 700000*700 floats + indexes, will spend a minimum of 11 GB (if my 
>>> math is correct) and possibly more if the asymptotic storage requirement is 
>>> more than 2 Int64 + 1 Float64 per stored value. 
>>> 
>>> Ivar 
>>> 
>>> kl. 01:46:22 UTC+2 onsdag 30. april 2014 skrev Ryan Gardner følgende: 
>>> Creating sparse arrays seems exceptionally slow. 
>>> 
>>> I can set up the non-zero data of the array relatively quickly.  For 
>>> example, the following code takes about 80 seconds on one machine. 
>>> 
>>> 
>>> vec_len = 700000 
>>> 
>>> 
>>> row_ind = Uint64[] 
>>> col_ind = Uint64[] 
>>> value = Float64[] 
>>> 
>>> 
>>> for j = 1:700000 
>>>   for k = 1:700 
>>>      ind = k*50 
>>>      push!(row_ind, ind) 
>>>      push!(col_ind, j) 
>>>      push!(value, 5.0) 
>>>   end 
>>> end 
>>> 
>>> 
>>> but then 
>>> 
>>> a = sparse(row_ind, col_ind, value, 700000, 700000) 
>>> 
>>> 
>>> takes more than at least about 30 minutes.  (I never let it finish.) 
>>> 
>>> It doesn't seem like the numbers I'm using should be that far off the 
>>> scale.  Is there a more efficient way I should be doing what I'm doing?  Am 
>>> I missing something and asking for something that really is impractical? 
>>> 
>>> If not, I may be able to look into the sparse matrix code a little this 
>>> weekend. 
>>> 
>>> 
>>> The never-finishing result is the same if I try 
>>> 
>>> sprand(700000, 700000, .001) 
>>> 
>>> or if I try to set 700000*700 values in a sparse matrix of zeros directly.  
>>> Thanks. 
>>> 
>>> 
>> 
>> 
>

Re: [julia-users] efficiency of sparse array creation

Reply via email to