Re: [julia-users] Help optimizing sparse matrix code

Viral Shah Thu, 13 Nov 2014 19:38:42 -0800

Oops - correction. The goal is to be as fast or faster than 
Matlab/Octave/Scilab, but in certain cases, we may accept being within 1.5. 
So, report an issue if a particular sparse matrix operation is slower. Even 
if we can't address some such issues right away, it is good to have them in 
the issues to address later.


-viral

On Friday, November 14, 2014 9:06:23 AM UTC+5:30, Viral Shah wrote:
>
> There is no reason we should be slower than Matlab's C implementation by 
> more than a factor of 2, ideally 1.5. Not considering GC time, which is 
> being addressed separately anyways, if you conclusively find certain sparse 
> matrix operations slower than Matlab, please file an issue. There are 
> always some code paths and certain operations that may not have been fully 
> optimized just yet.
>
> -viral
>
> On Thursday, November 13, 2014 1:09:15 AM UTC+5:30, Joshua Tokle wrote:
>>
>> I unrolled the loop as suggested and it brought the runtime down to 
>> 85-100 seconds from around 220 seconds. Thanks! We're now approaching the 
>> speed of the original matlab code, which runs in about 70 seconds. I'll try 
>> to think of other was to bypass the unnecessary creation of temporary 
>> objects.
>>
>> Out of curiosity, why do you suppose the matlab code performs so well on 
>> the original implementation compared to julia? Better sparse matrix 
>> operations? Is it better optimized for temporary objects? Something else?
>>
>> On Tue, Nov 11, 2014 at 11:25 PM, Tony Kelman <[email protected]> wrote:
>>
>>> At this point the most effective way to write decently-performing sparse 
>>> matrix code in Julia is to familiarize yourself with the compressed sparse 
>>> column format (
>>> http://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_.28CSC_or_CCS.29)
>>>  
>>> and work directly on the colptr, rowval, and nzval vectors. This isn't much 
>>> better than writing sparse matrix code in C (just keep in mind that the 
>>> arrays are 1-based in Julia, whereas if you were writing a mex file or 
>>> standalone C code they would be 0-based), but it's not too terrible once 
>>> you've done it a few times.
>>>
>>> Any time you are trying to do a standard elementwise, binary, or 
>>> reduction operation across dimensions - min, max, sum, sumabs, sumabs2, etc 
>>> - on sparse matrices and you see the current Julia version is performing 
>>> poorly because the fallback version is dense, you can implement it manually 
>>> without too much trouble by operating directly on the colptr, rowval, and 
>>> nzval vectors. Pull requests to add additional sparse-specialized 
>>> operations to sparsematrix.jl are welcome. I'd personally like to see a 
>>> more general design of the array system long-term that would reduce the 
>>> number of specialized methods that need to be written for each sparse 
>>> matrix format, but that's a ways off.
>>>
>>>
>>> On Tuesday, November 11, 2014 7:11:22 AM UTC-8, Milan Bouchet-Valat 
>>> wrote:
>>>>
>>>> Le mardi 11 novembre 2014 à 09:54 -0500, Joshua Tokle a écrit : 
>>>> > I think I see: you're suggesting a single loop that performs the 
>>>> > a*b*(c+d+e) operation element-wise, and this will prevent the 
>>>> > allocation of intermediate results. Is that right? 
>>>> Yes. 
>>>>
>>>> > Yes, the sparsity pattern will differ between the vectors I'm 
>>>> > operating on, but maybe I can adapt the code from sparsematrix.jl to 
>>>> > my special case. 
>>>> Indeed, this makes my proposition more challenging. A general algorithm 
>>>> to combine several sparse matrices would be very useful. 
>>>>
>>>>
>>>> Regards 
>>>>
>>>> > On Tue, Nov 11, 2014 at 9:43 AM, Milan Bouchet-Valat 
>>>> > <[email protected]> wrote: 
>>>> >         Le mardi 11 novembre 2014 à 04:52 -0800, Joshua Tokle a 
>>>> >         écrit : 
>>>> >         > Thanks for your response. Because the operations are on 
>>>> >         sparse 
>>>> >         > matrices I'm pretty sure the arithmetic is already more 
>>>> >         optimized than 
>>>> >         > something I would write: 
>>>> >         > 
>>>> >         > 
>>>> >         https://github.com/JuliaLang/julia/blob/master/base/sparse/
>>>> sparsematrix.jl#L530 
>>>> >         > I did actually spend some time searching for sparse matrix 
>>>> >         arithmetic 
>>>> >         > algorithms, but I didn't come up with anything that seemed 
>>>> >         like it 
>>>> >         > would be an improvement. 
>>>> >         Of course the algorithm is optimized for sparse matrices, but 
>>>> >         every call 
>>>> >         to + or .* creates a copy of the matrix, which is not 
>>>> >         efficient at all. 
>>>> >         
>>>> >         > It seems that the issue may be with garbage collection. I'm 
>>>> >         going to 
>>>> >         > post a top-level reply with more on this. 
>>>> >         I'm not an expert of sparse matrices, but one efficient way 
>>>> of 
>>>> >         avoid 
>>>> >         temporary copies and garbage collection is to manually unroll 
>>>> >         your loop, 
>>>> >         as I advised in my first message. Now I must admit it's more 
>>>> >         complex 
>>>> >         than it seems if the structure of nonzeros is not always the 
>>>> >         same in the 
>>>> >         various matrices you want to combine. Is that the case? 
>>>> >         
>>>> >         
>>>> >         Regards 
>>>> >         
>>>> >         
>>>> >         Regards 
>>>> >         
>>>> >         > On Monday, November 10, 2014 5:05:29 PM UTC-5, Milan 
>>>> >         Bouchet-Valat 
>>>> >         > wrote: 
>>>> >         >         Le lundi 10 novembre 2014 à 13:03 -0800, Joshua 
>>>> >         Tokle a 
>>>> >         >         écrit : 
>>>> >         >         > Hello! I'm trying to replace an existing matlab 
>>>> >         code with 
>>>> >         >         julia and 
>>>> >         >         > I'm having trouble matching the performance of 
>>>> the 
>>>> >         original 
>>>> >         >         code. The 
>>>> >         >         > matlab code is here: 
>>>> >         >         > 
>>>> >         >         > 
>>>> >         > 
>>>> >          https://github.com/jotok/InventorDisambiguator/blob/
>>>> julia/Disambig.m 
>>>> >         >         > 
>>>> >         >         > The program clusters inventors from a database of 
>>>> >         patent 
>>>> >         >         applications. 
>>>> >         >         > The input data is a sparse boolean matrix (named 
>>>> >         XX in the 
>>>> >         >         script), 
>>>> >         >         > where each row defines an inventor and each 
>>>> column 
>>>> >         defines a 
>>>> >         >         feature. 
>>>> >         >         > For example, the jth column might correspond to a 
>>>> >         feature 
>>>> >         >         "first name 
>>>> >         >         > is John". If there is a 1 in the XX[i, j], this 
>>>> >         means that 
>>>> >         >         inventor 
>>>> >         >         > i's first name is John. Given an inventor i, we 
>>>> >         find similar 
>>>> >         >         inventors 
>>>> >         >         > by identifying rows in the matrix that agree with 
>>>> >         XX[i, :] 
>>>> >         >         on a given 
>>>> >         >         > column and then applying element-wise boolean 
>>>> >         operations to 
>>>> >         >         the rows. 
>>>> >         >         > In the code, for a given value of `index`, 
>>>> >         C_lastname holds 
>>>> >         >         the unique 
>>>> >         >         > column in XX corresponding to a "last name" 
>>>> >         feature such 
>>>> >         >         that 
>>>> >         >         > XX[index, :] equals 1. C_firstname holds the 
>>>> >         unique column 
>>>> >         >         in XX 
>>>> >         >         > corresponding to a "first name" feature such that 
>>>> >         >         XX[index, :] equals 
>>>> >         >         > 1. And so on. The following code snippet finds 
>>>> all 
>>>> >         rows in 
>>>> >         >         the matrix 
>>>> >         >         > that agree with XX[index, :] on full name and one 
>>>> >         of patent 
>>>> >         >         assignee 
>>>> >         >         > name, inventory city, or patent class: 
>>>> >         >         > 
>>>> >         >         >     lump_index_2 = step & ((C_assignee | C_city | 
>>>> >         C_class)) 
>>>> >         >         > 
>>>> >         >         > The `step` variable is an indicator that's used 
>>>> to 
>>>> >         prevent 
>>>> >         >         the same 
>>>> >         >         > inventors from being considered multiple times. 
>>>> My 
>>>> >         attempt 
>>>> >         >         at a 
>>>> >         >         > literal translation of this code to julia is 
>>>> here: 
>>>> >         >         > 
>>>> >         >         > 
>>>> >         > 
>>>> >          https://github.com/jotok/InventorDisambiguator/blob/
>>>> julia/disambig.jl 
>>>> >         >         > 
>>>> >         >         > The matrix X is of type SparseMatrixCSC{Int64, 
>>>> >         Int64}. 
>>>> >         >         Boolean 
>>>> >         >         > operations aren't supported for sparse matrices 
>>>> in 
>>>> >         julia, so 
>>>> >         >         I fake it 
>>>> >         >         > with integer arithmetic.  The line that 
>>>> >         corresponds to the 
>>>> >         >         matlab code 
>>>> >         >         > above is 
>>>> >         >         > 
>>>> >         >         >     lump_index_2 = find(step .* (C_name .* 
>>>> >         (C_assignee + 
>>>> >         >         C_city + C_class))) 
>>>> >         >         You should be able to get a speedup by replacing 
>>>> >         this line 
>>>> >         >         with an 
>>>> >         >         explicit `for` loop. First, you'll avoid memory 
>>>> >         allocation 
>>>> >         >         (one for each 
>>>> >         >         + or .* operation). Second, you'll be able to 
>>>> return 
>>>> >         as soon 
>>>> >         >         as the 
>>>> >         >         index is found, instead of computing the value for 
>>>> >         all 
>>>> >         >         elements (IIUC 
>>>> >         >         you're only looking for one index, right?). 
>>>> >         > 
>>>> >         > 
>>>> >         >         My two cents 
>>>> >         > 
>>>> >         >         > The reason I grouped it this way is that 
>>>> initially 
>>>> >         `step` 
>>>> >         >         will be a 
>>>> >         >         > "sparse" vector of all 1's, and I thought it 
>>>> might 
>>>> >         help to 
>>>> >         >         do the 
>>>> >         >         > truly sparse arithmetic first. 
>>>> >         >         > 
>>>> >         >         > I've been testing this code on a Windows 2008 
>>>> >         Server. The 
>>>> >         >         test data 
>>>> >         >         > contains 45,763 inventors and 274,578 possible 
>>>> >         features (in 
>>>> >         >         other 
>>>> >         >         > words, XX is an 45,763 x 274,58 sparse matrix). 
>>>> >         The matlab 
>>>> >         >         program 
>>>> >         >         > consistently takes about 70 seconds to run on 
>>>> this 
>>>> >         data. The 
>>>> >         >         julia 
>>>> >         >         > version shows a lot of variation: it's taken as 
>>>> >         little as 60 
>>>> >         >         seconds 
>>>> >         >         > and as much as 10 minutes. However, most runs 
>>>> take 
>>>> >         around 
>>>> >         >         3.5 to 4 
>>>> >         >         > minutes. I pasted one output from the sampling 
>>>> >         profiler here 
>>>> >         >         [1]. If 
>>>> >         >         > I'm reading this correctly, it looks like the 
>>>> >         program is 
>>>> >         >         spending most 
>>>> >         >         > of its time performing element-wise 
>>>> multiplication 
>>>> >         of the 
>>>> >         >         indicator 
>>>> >         >         > vectors I described above. 
>>>> >         >         > 
>>>> >         >         > I would be grateful for any suggestions that 
>>>> would 
>>>> >         bring 
>>>> >         >         the 
>>>> >         >         > performance of the julia program in line with the 
>>>> >         matlab 
>>>> >         >         version. I've 
>>>> >         >         > heard that the last time the matlab code was run 
>>>> >         on the full 
>>>> >         >         data set 
>>>> >         >         > it took a couple days, so a slow-down of 3-4x is 
>>>> a 
>>>> >         >         signficant burden. 
>>>> >         >         > I did attempt to write a more idiomatic julia 
>>>> >         version using 
>>>> >         >         Dicts and 
>>>> >         >         > Sets, but it's slower than the version that uses 
>>>> >         sparse 
>>>> >         >         matrix 
>>>> >         >         > operations: 
>>>> >         >         > 
>>>> >         >         > 
>>>> >         > 
>>>> >          https://github.com/jotok/InventorDisambiguator/blob/
>>>> julia/disambig2.jl 
>>>> >         >         > 
>>>> >         >         > Thank you! 
>>>> >         >         > Josh 
>>>> >         >         > 
>>>> >         >         > 
>>>> >         >         > [1] 
>>>> >         https://gist.github.com/jotok/6b469a1dc0ff9529caf5 
>>>> >         >         > 
>>>> >         >         > 
>>>> >         > 
>>>> >         
>>>> >         
>>>> > 
>>>> > 
>>>>
>>>>
>>

Re: [julia-users] Help optimizing sparse matrix code

Reply via email to