Le lundi 01 décembre 2014 à 02:50 +0000, SLiZn Liu a écrit :
> I have a n by m dense matrix, and each row is a vector
> representing variating flows like stock price, and I'd like to find
> out the two vectors which have the highest similarity using cor(). 
> Hence, a nested for-loop was utilized to calculate the similarity
> between each pair, and fill the similarity into an n by n adjacency
> matrix. 
In that case you can simply use the Distances.jl package like this:
pairwise(CorrDist(), x)

(Though this will compute correlations between columns, not rows. And
the distance is 1 - correlation.)


If you look at the code it uses by calling
edit(pairwise!)

you'll see that it relies on array views to avoid creating copies of the
columns.


Regards


> On Fri Nov 28 2014 at 8:49:51 PM Milan Bouchet-Valat
> <[email protected]> wrote:
>         Le vendredi 28 novembre 2014 à 10:21 +0000, SLiZn Liu a
>         écrit :
>         > I'm doing row-wise/col-wise calculation, isn't it inevitable
>         to create
>         > row/col copies after iteratively extract single elements?
>         No, I don't think so, though sometimes you'll want to extract
>         a full
>         row/column to pass it to a standard function instead of
>         writing all of
>         the computations by hand. That's where array views are very
>         useful.
>         
>         But can you give more details about the calculation you need
>         to do?
>         
>         
>         Regards
>         
>         > I will consider to take a shot on option 1, ArrayViews if
>         this
>         > single-element-extraction comes to a dead end. Thanks,
>         Milan!
>         >
>         >
>         >
>         > On Fri Nov 28 2014 at 6:00:07 PM Milan Bouchet-Valat
>         > <[email protected]> wrote:
>         >         Le vendredi 28 novembre 2014 à 01:45 -0800, Todd Leo
>         a écrit :
>         >         > Hi Fellows,
>         >         >
>         >         >
>         >         > Say I have a 1000 x 1000 matrix, and I'm going to
>         do some
>         >         calculation
>         >         > in a nested for-loop, with each pair of rows/cols
>         in the
>         >         matrix. But I
>         >         > suffered a heavy performance penalty in row/col
>         extraction.
>         >         Here's my
>         >         > minimum reproducible example, which I hope
>         explains itself.
>         >         >
>         >         >
>         >         > A = rand(0.:0.01:1.,1000,1000)
>         >         >
>         >         >
>         >         > function test(x)
>         >         >     for i in 1:1000, j in 1:1000
>         >         >         x[:,i]
>         >         >         x[:,j]
>         >         >     end
>         >         > end
>         >         >
>         >         >
>         >         > test(A) # warm up
>         >         > gc()
>         >         > @time test(A)
>         >         > ## elapsed time: 13.28547939 seconds (16208000080
>         bytes
>         >         allocated,
>         >         > 72.42% gc time)
>         >         >
>         >         >  It takes 13 seconds, only extracting the
>         rows/cols for the
>         >         sake of
>         >         > further calculations. I'm wondering if anything I
>         could do
>         >         to improve
>         >         > the performance.Thanks in advance.
>         >         This is because extracting a row/column creates a
>         copy.
>         >         Depending on
>         >         what calculation you want to do on them, you can:
>         >         - use arrays views (which will become the default
>         when
>         >         extracting slices
>         >         in 0.4): https://github.com/JuliaLang/ArrayViews.jl
>         >         - manually write loops to go over the row and column
>         so that
>         >         you only
>         >         extract one individual element of the matrix at a
>         time
>         >
>         >
>         >         Regards
>         

Reply via email to