Re: [julia-users] Help optimizing sparse matrix code

Viral Shah Thu, 13 Nov 2014 19:36:41 -0800

There is no reason we should be slower than Matlab's C implementation by 
more than a factor of 2, ideally 1.5. Not considering GC time, which is 
being addressed separately anyways, if you conclusively find certain sparse 
matrix operations slower than Matlab, please file an issue. There are 
always some code paths and certain operations that may not have been fully 
optimized just yet.


-viral

On Thursday, November 13, 2014 1:09:15 AM UTC+5:30, Joshua Tokle wrote:
>
> I unrolled the loop as suggested and it brought the runtime down to 85-100 
> seconds from around 220 seconds. Thanks! We're now approaching the speed of 
> the original matlab code, which runs in about 70 seconds. I'll try to think 
> of other was to bypass the unnecessary creation of temporary objects.
>
> Out of curiosity, why do you suppose the matlab code performs so well on 
> the original implementation compared to julia? Better sparse matrix 
> operations? Is it better optimized for temporary objects? Something else?
>
> On Tue, Nov 11, 2014 at 11:25 PM, Tony Kelman <[email protected]> wrote:
>
>> At this point the most effective way to write decently-performing sparse 
>> matrix code in Julia is to familiarize yourself with the compressed sparse 
>> column format (
>> http://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_.28CSC_or_CCS.29)
>>  
>> and work directly on the colptr, rowval, and nzval vectors. This isn't much 
>> better than writing sparse matrix code in C (just keep in mind that the 
>> arrays are 1-based in Julia, whereas if you were writing a mex file or 
>> standalone C code they would be 0-based), but it's not too terrible once 
>> you've done it a few times.
>>
>> Any time you are trying to do a standard elementwise, binary, or 
>> reduction operation across dimensions - min, max, sum, sumabs, sumabs2, etc 
>> - on sparse matrices and you see the current Julia version is performing 
>> poorly because the fallback version is dense, you can implement it manually 
>> without too much trouble by operating directly on the colptr, rowval, and 
>> nzval vectors. Pull requests to add additional sparse-specialized 
>> operations to sparsematrix.jl are welcome. I'd personally like to see a 
>> more general design of the array system long-term that would reduce the 
>> number of specialized methods that need to be written for each sparse 
>> matrix format, but that's a ways off.
>>
>>
>> On Tuesday, November 11, 2014 7:11:22 AM UTC-8, Milan Bouchet-Valat wrote:
>>>
>>> Le mardi 11 novembre 2014 à 09:54 -0500, Joshua Tokle a écrit : 
>>> > I think I see: you're suggesting a single loop that performs the 
>>> > a*b*(c+d+e) operation element-wise, and this will prevent the 
>>> > allocation of intermediate results. Is that right? 
>>> Yes. 
>>>
>>> > Yes, the sparsity pattern will differ between the vectors I'm 
>>> > operating on, but maybe I can adapt the code from sparsematrix.jl to 
>>> > my special case. 
>>> Indeed, this makes my proposition more challenging. A general algorithm 
>>> to combine several sparse matrices would be very useful. 
>>>
>>>
>>> Regards 
>>>
>>> > On Tue, Nov 11, 2014 at 9:43 AM, Milan Bouchet-Valat 
>>> > <[email protected]> wrote: 
>>> >         Le mardi 11 novembre 2014 à 04:52 -0800, Joshua Tokle a 
>>> >         écrit : 
>>> >         > Thanks for your response. Because the operations are on 
>>> >         sparse 
>>> >         > matrices I'm pretty sure the arithmetic is already more 
>>> >         optimized than 
>>> >         > something I would write: 
>>> >         > 
>>> >         > 
>>> >         https://github.com/JuliaLang/julia/blob/master/base/sparse/
>>> sparsematrix.jl#L530 
>>> >         > I did actually spend some time searching for sparse matrix 
>>> >         arithmetic 
>>> >         > algorithms, but I didn't come up with anything that seemed 
>>> >         like it 
>>> >         > would be an improvement. 
>>> >         Of course the algorithm is optimized for sparse matrices, but 
>>> >         every call 
>>> >         to + or .* creates a copy of the matrix, which is not 
>>> >         efficient at all. 
>>> >         
>>> >         > It seems that the issue may be with garbage collection. I'm 
>>> >         going to 
>>> >         > post a top-level reply with more on this. 
>>> >         I'm not an expert of sparse matrices, but one efficient way of 
>>> >         avoid 
>>> >         temporary copies and garbage collection is to manually unroll 
>>> >         your loop, 
>>> >         as I advised in my first message. Now I must admit it's more 
>>> >         complex 
>>> >         than it seems if the structure of nonzeros is not always the 
>>> >         same in the 
>>> >         various matrices you want to combine. Is that the case? 
>>> >         
>>> >         
>>> >         Regards 
>>> >         
>>> >         
>>> >         Regards 
>>> >         
>>> >         > On Monday, November 10, 2014 5:05:29 PM UTC-5, Milan 
>>> >         Bouchet-Valat 
>>> >         > wrote: 
>>> >         >         Le lundi 10 novembre 2014 à 13:03 -0800, Joshua 
>>> >         Tokle a 
>>> >         >         écrit : 
>>> >         >         > Hello! I'm trying to replace an existing matlab 
>>> >         code with 
>>> >         >         julia and 
>>> >         >         > I'm having trouble matching the performance of the 
>>> >         original 
>>> >         >         code. The 
>>> >         >         > matlab code is here: 
>>> >         >         > 
>>> >         >         > 
>>> >         > 
>>> >          https://github.com/jotok/InventorDisambiguator/blob/
>>> julia/Disambig.m 
>>> >         >         > 
>>> >         >         > The program clusters inventors from a database of 
>>> >         patent 
>>> >         >         applications. 
>>> >         >         > The input data is a sparse boolean matrix (named 
>>> >         XX in the 
>>> >         >         script), 
>>> >         >         > where each row defines an inventor and each column 
>>> >         defines a 
>>> >         >         feature. 
>>> >         >         > For example, the jth column might correspond to a 
>>> >         feature 
>>> >         >         "first name 
>>> >         >         > is John". If there is a 1 in the XX[i, j], this 
>>> >         means that 
>>> >         >         inventor 
>>> >         >         > i's first name is John. Given an inventor i, we 
>>> >         find similar 
>>> >         >         inventors 
>>> >         >         > by identifying rows in the matrix that agree with 
>>> >         XX[i, :] 
>>> >         >         on a given 
>>> >         >         > column and then applying element-wise boolean 
>>> >         operations to 
>>> >         >         the rows. 
>>> >         >         > In the code, for a given value of `index`, 
>>> >         C_lastname holds 
>>> >         >         the unique 
>>> >         >         > column in XX corresponding to a "last name" 
>>> >         feature such 
>>> >         >         that 
>>> >         >         > XX[index, :] equals 1. C_firstname holds the 
>>> >         unique column 
>>> >         >         in XX 
>>> >         >         > corresponding to a "first name" feature such that 
>>> >         >         XX[index, :] equals 
>>> >         >         > 1. And so on. The following code snippet finds all 
>>> >         rows in 
>>> >         >         the matrix 
>>> >         >         > that agree with XX[index, :] on full name and one 
>>> >         of patent 
>>> >         >         assignee 
>>> >         >         > name, inventory city, or patent class: 
>>> >         >         > 
>>> >         >         >     lump_index_2 = step & ((C_assignee | C_city | 
>>> >         C_class)) 
>>> >         >         > 
>>> >         >         > The `step` variable is an indicator that's used to 
>>> >         prevent 
>>> >         >         the same 
>>> >         >         > inventors from being considered multiple times. My 
>>> >         attempt 
>>> >         >         at a 
>>> >         >         > literal translation of this code to julia is here: 
>>> >         >         > 
>>> >         >         > 
>>> >         > 
>>> >          https://github.com/jotok/InventorDisambiguator/blob/
>>> julia/disambig.jl 
>>> >         >         > 
>>> >         >         > The matrix X is of type SparseMatrixCSC{Int64, 
>>> >         Int64}. 
>>> >         >         Boolean 
>>> >         >         > operations aren't supported for sparse matrices in 
>>> >         julia, so 
>>> >         >         I fake it 
>>> >         >         > with integer arithmetic.  The line that 
>>> >         corresponds to the 
>>> >         >         matlab code 
>>> >         >         > above is 
>>> >         >         > 
>>> >         >         >     lump_index_2 = find(step .* (C_name .* 
>>> >         (C_assignee + 
>>> >         >         C_city + C_class))) 
>>> >         >         You should be able to get a speedup by replacing 
>>> >         this line 
>>> >         >         with an 
>>> >         >         explicit `for` loop. First, you'll avoid memory 
>>> >         allocation 
>>> >         >         (one for each 
>>> >         >         + or .* operation). Second, you'll be able to return 
>>> >         as soon 
>>> >         >         as the 
>>> >         >         index is found, instead of computing the value for 
>>> >         all 
>>> >         >         elements (IIUC 
>>> >         >         you're only looking for one index, right?). 
>>> >         > 
>>> >         > 
>>> >         >         My two cents 
>>> >         > 
>>> >         >         > The reason I grouped it this way is that initially 
>>> >         `step` 
>>> >         >         will be a 
>>> >         >         > "sparse" vector of all 1's, and I thought it might 
>>> >         help to 
>>> >         >         do the 
>>> >         >         > truly sparse arithmetic first. 
>>> >         >         > 
>>> >         >         > I've been testing this code on a Windows 2008 
>>> >         Server. The 
>>> >         >         test data 
>>> >         >         > contains 45,763 inventors and 274,578 possible 
>>> >         features (in 
>>> >         >         other 
>>> >         >         > words, XX is an 45,763 x 274,58 sparse matrix). 
>>> >         The matlab 
>>> >         >         program 
>>> >         >         > consistently takes about 70 seconds to run on this 
>>> >         data. The 
>>> >         >         julia 
>>> >         >         > version shows a lot of variation: it's taken as 
>>> >         little as 60 
>>> >         >         seconds 
>>> >         >         > and as much as 10 minutes. However, most runs take 
>>> >         around 
>>> >         >         3.5 to 4 
>>> >         >         > minutes. I pasted one output from the sampling 
>>> >         profiler here 
>>> >         >         [1]. If 
>>> >         >         > I'm reading this correctly, it looks like the 
>>> >         program is 
>>> >         >         spending most 
>>> >         >         > of its time performing element-wise multiplication 
>>> >         of the 
>>> >         >         indicator 
>>> >         >         > vectors I described above. 
>>> >         >         > 
>>> >         >         > I would be grateful for any suggestions that would 
>>> >         bring 
>>> >         >         the 
>>> >         >         > performance of the julia program in line with the 
>>> >         matlab 
>>> >         >         version. I've 
>>> >         >         > heard that the last time the matlab code was run 
>>> >         on the full 
>>> >         >         data set 
>>> >         >         > it took a couple days, so a slow-down of 3-4x is a 
>>> >         >         signficant burden. 
>>> >         >         > I did attempt to write a more idiomatic julia 
>>> >         version using 
>>> >         >         Dicts and 
>>> >         >         > Sets, but it's slower than the version that uses 
>>> >         sparse 
>>> >         >         matrix 
>>> >         >         > operations: 
>>> >         >         > 
>>> >         >         > 
>>> >         > 
>>> >          https://github.com/jotok/InventorDisambiguator/blob/
>>> julia/disambig2.jl 
>>> >         >         > 
>>> >         >         > Thank you! 
>>> >         >         > Josh 
>>> >         >         > 
>>> >         >         > 
>>> >         >         > [1] 
>>> >         https://gist.github.com/jotok/6b469a1dc0ff9529caf5 
>>> >         >         > 
>>> >         >         > 
>>> >         > 
>>> >         
>>> >         
>>> > 
>>> > 
>>>
>>>
>

Re: [julia-users] Help optimizing sparse matrix code

Reply via email to