Re: [julia-users] Help optimizing sparse matrix code

Douglas Bates Wed, 12 Nov 2014 13:06:29 -0800

Also, there is a disadvantage in leaning too hard on CHOLMOD or UMFPACK or 
CSparse code in that the licenses are GPL or LGPL.  The goal is eventually 
to eliminate those dependencies.


On Wednesday, November 12, 2014 2:37:28 PM UTC-6, Tony Kelman wrote:
>
> Virtually all operations on sparse matrices in Matlab are dispatching out 
> to specialized C code, largely (but not entirely) from Tim Davis' 
> SuiteSparse library. There are Julia bindings to some pieces of that same 
> library - CHOLMOD for sparse Cholesky factorization and UMFPACK for sparse 
> LU factorization, but not too much else. We don't use SuiteSparse in Julia 
> for things like indexing, slicing, reductions, sparse matrix-vector 
> multiplication, sparse matrix-matrix multiplication, etc. Those operations 
> are implemented in pure Julia, and as you've seen it's not very well 
> optimized right now. And there are many operations for which no one has 
> written a specialized method for SparseMatrixCSC yet, so it dispatches to 
> the generic AbstractArray version which plainly does not work correctly for 
> large sparse matrices. The short term solution is "write more of those 
> methods," and every little bit helps.
>
>
> On Wednesday, November 12, 2014 11:39:15 AM UTC-8, Joshua Tokle wrote:
>>
>> I unrolled the loop as suggested and it brought the runtime down to 
>> 85-100 seconds from around 220 seconds. Thanks! We're now approaching the 
>> speed of the original matlab code, which runs in about 70 seconds. I'll try 
>> to think of other was to bypass the unnecessary creation of temporary 
>> objects.
>>
>> Out of curiosity, why do you suppose the matlab code performs so well on 
>> the original implementation compared to julia? Better sparse matrix 
>> operations? Is it better optimized for temporary objects? Something else?
>>
>> On Tue, Nov 11, 2014 at 11:25 PM, Tony Kelman <to...@kelman.net> wrote:
>>
>>> At this point the most effective way to write decently-performing sparse 
>>> matrix code in Julia is to familiarize yourself with the compressed sparse 
>>> column format (
>>> http://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_.28CSC_or_CCS.29)
>>>  
>>> and work directly on the colptr, rowval, and nzval vectors. This isn't much 
>>> better than writing sparse matrix code in C (just keep in mind that the 
>>> arrays are 1-based in Julia, whereas if you were writing a mex file or 
>>> standalone C code they would be 0-based), but it's not too terrible once 
>>> you've done it a few times.
>>>
>>> Any time you are trying to do a standard elementwise, binary, or 
>>> reduction operation across dimensions - min, max, sum, sumabs, sumabs2, etc 
>>> - on sparse matrices and you see the current Julia version is performing 
>>> poorly because the fallback version is dense, you can implement it manually 
>>> without too much trouble by operating directly on the colptr, rowval, and 
>>> nzval vectors. Pull requests to add additional sparse-specialized 
>>> operations to sparsematrix.jl are welcome. I'd personally like to see a 
>>> more general design of the array system long-term that would reduce the 
>>> number of specialized methods that need to be written for each sparse 
>>> matrix format, but that's a ways off.
>>>
>>>
>>> On Tuesday, November 11, 2014 7:11:22 AM UTC-8, Milan Bouchet-Valat 
>>> wrote:
>>>>
>>>> Le mardi 11 novembre 2014 à 09:54 -0500, Joshua Tokle a écrit : 
>>>> > I think I see: you're suggesting a single loop that performs the 
>>>> > a*b*(c+d+e) operation element-wise, and this will prevent the 
>>>> > allocation of intermediate results. Is that right? 
>>>> Yes. 
>>>>
>>>> > Yes, the sparsity pattern will differ between the vectors I'm 
>>>> > operating on, but maybe I can adapt the code from sparsematrix.jl to 
>>>> > my special case. 
>>>> Indeed, this makes my proposition more challenging. A general algorithm 
>>>> to combine several sparse matrices would be very useful. 
>>>>
>>>>
>>>> Regards 
>>>>
>>>> > On Tue, Nov 11, 2014 at 9:43 AM, Milan Bouchet-Valat 
>>>> > <nali...@club.fr> wrote: 
>>>> >         Le mardi 11 novembre 2014 à 04:52 -0800, Joshua Tokle a 
>>>> >         écrit : 
>>>> >         > Thanks for your response. Because the operations are on 
>>>> >         sparse 
>>>> >         > matrices I'm pretty sure the arithmetic is already more 
>>>> >         optimized than 
>>>> >         > something I would write: 
>>>> >         > 
>>>> >         > 
>>>> >         https://github.com/JuliaLang/julia/blob/master/base/sparse/
>>>> sparsematrix.jl#L530 
>>>> >         > I did actually spend some time searching for sparse matrix 
>>>> >         arithmetic 
>>>> >         > algorithms, but I didn't come up with anything that seemed 
>>>> >         like it 
>>>> >         > would be an improvement. 
>>>> >         Of course the algorithm is optimized for sparse matrices, but 
>>>> >         every call 
>>>> >         to + or .* creates a copy of the matrix, which is not 
>>>> >         efficient at all. 
>>>> >         
>>>> >         > It seems that the issue may be with garbage collection. I'm 
>>>> >         going to 
>>>> >         > post a top-level reply with more on this. 
>>>> >         I'm not an expert of sparse matrices, but one efficient way 
>>>> of 
>>>> >         avoid 
>>>> >         temporary copies and garbage collection is to manually unroll 
>>>> >         your loop, 
>>>> >         as I advised in my first message. Now I must admit it's more 
>>>> >         complex 
>>>> >         than it seems if the structure of nonzeros is not always the 
>>>> >         same in the 
>>>> >         various matrices you want to combine. Is that the case? 
>>>> >         
>>>> >         
>>>> >         Regards 
>>>> >         
>>>> >         
>>>> >         Regards 
>>>> >         
>>>> >         > On Monday, November 10, 2014 5:05:29 PM UTC-5, Milan 
>>>> >         Bouchet-Valat 
>>>> >         > wrote: 
>>>> >         >         Le lundi 10 novembre 2014 à 13:03 -0800, Joshua 
>>>> >         Tokle a 
>>>> >         >         écrit : 
>>>> >         >         > Hello! I'm trying to replace an existing matlab 
>>>> >         code with 
>>>> >         >         julia and 
>>>> >         >         > I'm having trouble matching the performance of 
>>>> the 
>>>> >         original 
>>>> >         >         code. The 
>>>> >         >         > matlab code is here: 
>>>> >         >         > 
>>>> >         >         > 
>>>> >         > 
>>>> >          https://github.com/jotok/InventorDisambiguator/blob/
>>>> julia/Disambig.m 
>>>> >         >         > 
>>>> >         >         > The program clusters inventors from a database of 
>>>> >         patent 
>>>> >         >         applications. 
>>>> >         >         > The input data is a sparse boolean matrix (named 
>>>> >         XX in the 
>>>> >         >         script), 
>>>> >         >         > where each row defines an inventor and each 
>>>> column 
>>>> >         defines a 
>>>> >         >         feature. 
>>>> >         >         > For example, the jth column might correspond to a 
>>>> >         feature 
>>>> >         >         "first name 
>>>> >         >         > is John". If there is a 1 in the XX[i, j], this 
>>>> >         means that 
>>>> >         >         inventor 
>>>> >         >         > i's first name is John. Given an inventor i, we 
>>>> >         find similar 
>>>> >         >         inventors 
>>>> >         >         > by identifying rows in the matrix that agree with 
>>>> >         XX[i, :] 
>>>> >         >         on a given 
>>>> >         >         > column and then applying element-wise boolean 
>>>> >         operations to 
>>>> >         >         the rows. 
>>>> >         >         > In the code, for a given value of `index`, 
>>>> >         C_lastname holds 
>>>> >         >         the unique 
>>>> >         >         > column in XX corresponding to a "last name" 
>>>> >         feature such 
>>>> >         >         that 
>>>> >         >         > XX[index, :] equals 1. C_firstname holds the 
>>>> >         unique column 
>>>> >         >         in XX 
>>>> >         >         > corresponding to a "first name" feature such that 
>>>> >         >         XX[index, :] equals 
>>>> >         >         > 1. And so on. The following code snippet finds 
>>>> all 
>>>> >         rows in 
>>>> >         >         the matrix 
>>>> >         >         > that agree with XX[index, :] on full name and one 
>>>> >         of patent 
>>>> >         >         assignee 
>>>> >         >         > name, inventory city, or patent class: 
>>>> >         >         > 
>>>> >         >         >     lump_index_2 = step & ((C_assignee | C_city | 
>>>> >         C_class)) 
>>>> >         >         > 
>>>> >         >         > The `step` variable is an indicator that's used 
>>>> to 
>>>> >         prevent 
>>>> >         >         the same 
>>>> >         >         > inventors from being considered multiple times. 
>>>> My 
>>>> >         attempt 
>>>> >         >         at a 
>>>> >         >         > literal translation of this code to julia is 
>>>> here: 
>>>> >         >         > 
>>>> >         >         > 
>>>> >         > 
>>>> >          https://github.com/jotok/InventorDisambiguator/blob/
>>>> julia/disambig.jl 
>>>> >         >         > 
>>>> >         >         > The matrix X is of type SparseMatrixCSC{Int64, 
>>>> >         Int64}. 
>>>> >         >         Boolean 
>>>> >         >         > operations aren't supported for sparse matrices 
>>>> in 
>>>> >         julia, so 
>>>> >         >         I fake it 
>>>> >         >         > with integer arithmetic.  The line that 
>>>> >         corresponds to the 
>>>> >         >         matlab code 
>>>> >         >         > above is 
>>>> >         >         > 
>>>> >         >         >     lump_index_2 = find(step .* (C_name .* 
>>>> >         (C_assignee + 
>>>> >         >         C_city + C_class))) 
>>>> >         >         You should be able to get a speedup by replacing 
>>>> >         this line 
>>>> >         >         with an 
>>>> >         >         explicit `for` loop. First, you'll avoid memory 
>>>> >         allocation 
>>>> >         >         (one for each 
>>>> >         >         + or .* operation). Second, you'll be able to 
>>>> return 
>>>> >         as soon 
>>>> >         >         as the 
>>>> >         >         index is found, instead of computing the value for 
>>>> >         all 
>>>> >         >         elements (IIUC 
>>>> >         >         you're only looking for one index, right?). 
>>>> >         > 
>>>> >         > 
>>>> >         >         My two cents 
>>>> >         > 
>>>> >         >         > The reason I grouped it this way is that 
>>>> initially 
>>>> >         `step` 
>>>> >         >         will be a 
>>>> >         >         > "sparse" vector of all 1's, and I thought it 
>>>> might 
>>>> >         help to 
>>>> >         >         do the 
>>>> >         >         > truly sparse arithmetic first. 
>>>> >         >         > 
>>>> >         >         > I've been testing this code on a Windows 2008 
>>>> >         Server. The 
>>>> >         >         test data 
>>>> >         >         > contains 45,763 inventors and 274,578 possible 
>>>> >         features (in 
>>>> >         >         other 
>>>> >         >         > words, XX is an 45,763 x 274,58 sparse matrix). 
>>>> >         The matlab 
>>>> >         >         program 
>>>> >         >         > consistently takes about 70 seconds to run on 
>>>> this 
>>>> >         data. The 
>>>> >         >         julia 
>>>> >         >         > version shows a lot of variation: it's taken as 
>>>> >         little as 60 
>>>> >         >         seconds 
>>>> >         >         > and as much as 10 minutes. However, most runs 
>>>> take 
>>>> >         around 
>>>> >         >         3.5 to 4 
>>>> >         >         > minutes. I pasted one output from the sampling 
>>>> >         profiler here 
>>>> >         >         [1]. If 
>>>> >         >         > I'm reading this correctly, it looks like the 
>>>> >         program is 
>>>> >         >         spending most 
>>>> >         >         > of its time performing element-wise 
>>>> multiplication 
>>>> >         of the 
>>>> >         >         indicator 
>>>> >         >         > vectors I described above. 
>>>> >         >         > 
>>>> >         >         > I would be grateful for any suggestions that 
>>>> would 
>>>> >         bring 
>>>> >         >         the 
>>>> >         >         > performance of the julia program in line with the 
>>>> >         matlab 
>>>> >         >         version. I've 
>>>> >         >         > heard that the last time the matlab code was run 
>>>> >         on the full 
>>>> >         >         data set 
>>>> >         >         > it took a couple days, so a slow-down of 3-4x is 
>>>> a 
>>>> >         >         signficant burden. 
>>>> >         >         > I did attempt to write a more idiomatic julia 
>>>> >         version using 
>>>> >         >         Dicts and 
>>>> >         >         > Sets, but it's slower than the version that uses 
>>>> >         sparse 
>>>> >         >         matrix 
>>>> >         >         > operations: 
>>>> >         >         > 
>>>> >         >         > 
>>>> >         > 
>>>> >          https://github.com/jotok/InventorDisambiguator/blob/
>>>> julia/disambig2.jl 
>>>> >         >         > 
>>>> >         >         > Thank you! 
>>>> >         >         > Josh 
>>>> >         >         > 
>>>> >         >         > 
>>>> >         >         > [1] 
>>>> >         https://gist.github.com/jotok/6b469a1dc0ff9529caf5 
>>>> >         >         > 
>>>> >         >         > 
>>>> >         > 
>>>> >         
>>>> >         
>>>> > 
>>>> > 
>>>>
>>>>
>>

Re: [julia-users] Help optimizing sparse matrix code

Reply via email to