Re: [julia-users] Matrix Extraction Efficiency Problem

Todd Leo Mon, 01 Dec 2014 19:27:06 -0800

Thanks Kevin, will try SubArrays first.


On Tuesday, December 2, 2014 10:34:17 AM UTC+8, Tim Holy wrote:
>
> On Monday, December 01, 2014 06:25:45 PM Todd Leo wrote: 
> >> I've also tried ArrayViews on my previous applications, except 
> ArrayViews 
> > does not support sparse matrices. Will check on Linda Hua's source code 
> and 
> > try to implement views on sparse matrices soon. 
>
> A potentially easier path: SubArrays support views of sparse matrices. 
> It's 
> also worth noting that in julia 0.4, SubArrays have been improved and have 
> performance similar to (or better than) ArrayViews. 
>
> --Tim 
>
> > 
> > -- 
> > REGARDS, 
> > Todd Leo 
> > 
> > On Monday, December 1, 2014 8:55:15 PM UTC+8, Milan Bouchet-Valat wrote: 
> > > Le lundi 01 décembre 2014 à 02:50 +0000, SLiZn Liu a écrit : 
> > > > I have a n by m dense matrix, and each row is a vector 
> > > > representing variating flows like stock price, and I'd like to find 
> > > > out the two vectors which have the highest similarity using cor(). 
> > > > Hence, a nested for-loop was utilized to calculate the similarity 
> > > > between each pair, and fill the similarity into an n by n adjacency 
> > > > matrix. 
> > > 
> > > In that case you can simply use the Distances.jl package like this: 
> > > pairwise(CorrDist(), x) 
> > > 
> > > (Though this will compute correlations between columns, not rows. And 
> > > the distance is 1 - correlation.) 
> > > 
> > > 
> > > If you look at the code it uses by calling 
> > > edit(pairwise!) 
> > > 
> > > you'll see that it relies on array views to avoid creating copies of 
> the 
> > > columns. 
> > > 
> > > 
> > > Regards 
> > > 
> > > > On Fri Nov 28 2014 at 8:49:51 PM Milan Bouchet-Valat 
> > > > 
> > > > <[email protected] <javascript:>> wrote: 
> > > >         Le vendredi 28 novembre 2014 à 10:21 +0000, SLiZn Liu a 
> > > >         
> > > >         écrit : 
> > > >         > I'm doing row-wise/col-wise calculation, isn't it 
> inevitable 
> > > >         
> > > >         to create 
> > > >         
> > > >         > row/col copies after iteratively extract single elements? 
> > > >         
> > > >         No, I don't think so, though sometimes you'll want to 
> extract 
> > > >         a full 
> > > >         row/column to pass it to a standard function instead of 
> > > >         writing all of 
> > > >         the computations by hand. That's where array views are very 
> > > >         useful. 
> > > >         
> > > >         But can you give more details about the calculation you need 
> > > >         to do? 
> > > >         
> > > >         
> > > >         Regards 
> > > >         
> > > >         > I will consider to take a shot on option 1, ArrayViews if 
> > > >         
> > > >         this 
> > > >         
> > > >         > single-element-extraction comes to a dead end. Thanks, 
> > > >         
> > > >         Milan! 
> > > >         
> > > >         > On Fri Nov 28 2014 at 6:00:07 PM Milan Bouchet-Valat 
> > > >         > 
> > > >         > <[email protected] <javascript:>> wrote: 
> > > >         >         Le vendredi 28 novembre 2014 à 01:45 -0800, Todd 
> Leo 
> > > >         
> > > >         a écrit : 
> > > >         >         > Hi Fellows, 
> > > >         >         > 
> > > >         >         > 
> > > >         >         > Say I have a 1000 x 1000 matrix, and I'm going 
> to 
> > > >         
> > > >         do some 
> > > >         
> > > >         >         calculation 
> > > >         >         
> > > >         >         > in a nested for-loop, with each pair of 
> rows/cols 
> > > >         
> > > >         in the 
> > > >         
> > > >         >         matrix. But I 
> > > >         >         
> > > >         >         > suffered a heavy performance penalty in row/col 
> > > >         
> > > >         extraction. 
> > > >         
> > > >         >         Here's my 
> > > >         >         
> > > >         >         > minimum reproducible example, which I hope 
> > > >         
> > > >         explains itself. 
> > > >         
> > > >         >         > A = rand(0.:0.01:1.,1000,1000) 
> > > >         >         > 
> > > >         >         > 
> > > >         >         > function test(x) 
> > > >         >         > 
> > > >         >         >     for i in 1:1000, j in 1:1000 
> > > >         >         >     
> > > >         >         >         x[:,i] 
> > > >         >         >         x[:,j] 
> > > >         >         >     
> > > >         >         >     end 
> > > >         >         > 
> > > >         >         > end 
> > > >         >         > 
> > > >         >         > 
> > > >         >         > test(A) # warm up 
> > > >         >         > gc() 
> > > >         >         > @time test(A) 
> > > >         >         > ## elapsed time: 13.28547939 seconds 
> (16208000080 
> > > >         
> > > >         bytes 
> > > >         
> > > >         >         allocated, 
> > > >         >         
> > > >         >         > 72.42% gc time) 
> > > >         >         > 
> > > >         >         >  It takes 13 seconds, only extracting the 
> > > >         
> > > >         rows/cols for the 
> > > >         
> > > >         >         sake of 
> > > >         >         
> > > >         >         > further calculations. I'm wondering if anything 
> I 
> > > >         
> > > >         could do 
> > > >         
> > > >         >         to improve 
> > > >         >         
> > > >         >         > the performance.Thanks in advance. 
> > > >         >         
> > > >         >         This is because extracting a row/column creates a 
> > > >         
> > > >         copy. 
> > > >         
> > > >         >         Depending on 
> > > >         >         what calculation you want to do on them, you can: 
> > > >         >         - use arrays views (which will become the default 
> > > >         
> > > >         when 
> > > >         
> > > >         >         extracting slices 
> > > >         >         in 0.4): 
> https://github.com/JuliaLang/ArrayViews.jl 
> > > >         >         - manually write loops to go over the row and 
> column 
> > > >         
> > > >         so that 
> > > >         
> > > >         >         you only 
> > > >         >         extract one individual element of the matrix at a 
> > > >         
> > > >         time 
> > > >         
> > > >         >         Regards 
>
>

Re: [julia-users] Matrix Extraction Efficiency Problem

Reply via email to