Re: [julia-users] Matrix Extraction Efficiency Problem

Tim Holy Mon, 01 Dec 2014 18:35:03 -0800

On Monday, December 01, 2014 06:25:45 PM Todd Leo wrote:
>> I've also tried ArrayViews on my previous applications, except ArrayViews
> does not support sparse matrices. Will check on Linda Hua's source code and
> try to implement views on sparse matrices soon.


A potentially easier path: SubArrays support views of sparse matrices. It's 
also worth noting that in julia 0.4, SubArrays have been improved and have 
performance similar to (or better than) ArrayViews.

--Tim

> 
> --
> REGARDS,
> Todd Leo
> 
> On Monday, December 1, 2014 8:55:15 PM UTC+8, Milan Bouchet-Valat wrote:
> > Le lundi 01 décembre 2014 à 02:50 +0000, SLiZn Liu a écrit :
> > > I have a n by m dense matrix, and each row is a vector
> > > representing variating flows like stock price, and I'd like to find
> > > out the two vectors which have the highest similarity using cor().
> > > Hence, a nested for-loop was utilized to calculate the similarity
> > > between each pair, and fill the similarity into an n by n adjacency
> > > matrix.
> > 
> > In that case you can simply use the Distances.jl package like this:
> > pairwise(CorrDist(), x)
> > 
> > (Though this will compute correlations between columns, not rows. And
> > the distance is 1 - correlation.)
> > 
> > 
> > If you look at the code it uses by calling
> > edit(pairwise!)
> > 
> > you'll see that it relies on array views to avoid creating copies of the
> > columns.
> > 
> > 
> > Regards
> > 
> > > On Fri Nov 28 2014 at 8:49:51 PM Milan Bouchet-Valat
> > > 
> > > <[email protected] <javascript:>> wrote:
> > >         Le vendredi 28 novembre 2014 à 10:21 +0000, SLiZn Liu a
> > >         
> > >         écrit :
> > >         > I'm doing row-wise/col-wise calculation, isn't it inevitable
> > >         
> > >         to create
> > >         
> > >         > row/col copies after iteratively extract single elements?
> > >         
> > >         No, I don't think so, though sometimes you'll want to extract
> > >         a full
> > >         row/column to pass it to a standard function instead of
> > >         writing all of
> > >         the computations by hand. That's where array views are very
> > >         useful.
> > >         
> > >         But can you give more details about the calculation you need
> > >         to do?
> > >         
> > >         
> > >         Regards
> > >         
> > >         > I will consider to take a shot on option 1, ArrayViews if
> > >         
> > >         this
> > >         
> > >         > single-element-extraction comes to a dead end. Thanks,
> > >         
> > >         Milan!
> > >         
> > >         > On Fri Nov 28 2014 at 6:00:07 PM Milan Bouchet-Valat
> > >         > 
> > >         > <[email protected] <javascript:>> wrote:
> > >         >         Le vendredi 28 novembre 2014 à 01:45 -0800, Todd Leo
> > >         
> > >         a écrit :
> > >         >         > Hi Fellows,
> > >         >         > 
> > >         >         > 
> > >         >         > Say I have a 1000 x 1000 matrix, and I'm going to
> > >         
> > >         do some
> > >         
> > >         >         calculation
> > >         >         
> > >         >         > in a nested for-loop, with each pair of rows/cols
> > >         
> > >         in the
> > >         
> > >         >         matrix. But I
> > >         >         
> > >         >         > suffered a heavy performance penalty in row/col
> > >         
> > >         extraction.
> > >         
> > >         >         Here's my
> > >         >         
> > >         >         > minimum reproducible example, which I hope
> > >         
> > >         explains itself.
> > >         
> > >         >         > A = rand(0.:0.01:1.,1000,1000)
> > >         >         > 
> > >         >         > 
> > >         >         > function test(x)
> > >         >         > 
> > >         >         >     for i in 1:1000, j in 1:1000
> > >         >         >     
> > >         >         >         x[:,i]
> > >         >         >         x[:,j]
> > >         >         >     
> > >         >         >     end
> > >         >         > 
> > >         >         > end
> > >         >         > 
> > >         >         > 
> > >         >         > test(A) # warm up
> > >         >         > gc()
> > >         >         > @time test(A)
> > >         >         > ## elapsed time: 13.28547939 seconds (16208000080
> > >         
> > >         bytes
> > >         
> > >         >         allocated,
> > >         >         
> > >         >         > 72.42% gc time)
> > >         >         > 
> > >         >         >  It takes 13 seconds, only extracting the
> > >         
> > >         rows/cols for the
> > >         
> > >         >         sake of
> > >         >         
> > >         >         > further calculations. I'm wondering if anything I
> > >         
> > >         could do
> > >         
> > >         >         to improve
> > >         >         
> > >         >         > the performance.Thanks in advance.
> > >         >         
> > >         >         This is because extracting a row/column creates a
> > >         
> > >         copy.
> > >         
> > >         >         Depending on
> > >         >         what calculation you want to do on them, you can:
> > >         >         - use arrays views (which will become the default
> > >         
> > >         when
> > >         
> > >         >         extracting slices
> > >         >         in 0.4): https://github.com/JuliaLang/ArrayViews.jl
> > >         >         - manually write loops to go over the row and column
> > >         
> > >         so that
> > >         
> > >         >         you only
> > >         >         extract one individual element of the matrix at a
> > >         
> > >         time
> > >         
> > >         >         Regards

Re: [julia-users] Matrix Extraction Efficiency Problem

Reply via email to