Le lundi 01 décembre 2014 à 02:50 +0000, SLiZn Liu a écrit : > I have a n by m dense matrix, and each row is a vector > representing variating flows like stock price, and I'd like to find > out the two vectors which have the highest similarity using cor(). > Hence, a nested for-loop was utilized to calculate the similarity > between each pair, and fill the similarity into an n by n adjacency > matrix. In that case you can simply use the Distances.jl package like this: pairwise(CorrDist(), x)
(Though this will compute correlations between columns, not rows. And the distance is 1 - correlation.) If you look at the code it uses by calling edit(pairwise!) you'll see that it relies on array views to avoid creating copies of the columns. Regards > On Fri Nov 28 2014 at 8:49:51 PM Milan Bouchet-Valat > <[email protected]> wrote: > Le vendredi 28 novembre 2014 à 10:21 +0000, SLiZn Liu a > écrit : > > I'm doing row-wise/col-wise calculation, isn't it inevitable > to create > > row/col copies after iteratively extract single elements? > No, I don't think so, though sometimes you'll want to extract > a full > row/column to pass it to a standard function instead of > writing all of > the computations by hand. That's where array views are very > useful. > > But can you give more details about the calculation you need > to do? > > > Regards > > > I will consider to take a shot on option 1, ArrayViews if > this > > single-element-extraction comes to a dead end. Thanks, > Milan! > > > > > > > > On Fri Nov 28 2014 at 6:00:07 PM Milan Bouchet-Valat > > <[email protected]> wrote: > > Le vendredi 28 novembre 2014 à 01:45 -0800, Todd Leo > a écrit : > > > Hi Fellows, > > > > > > > > > Say I have a 1000 x 1000 matrix, and I'm going to > do some > > calculation > > > in a nested for-loop, with each pair of rows/cols > in the > > matrix. But I > > > suffered a heavy performance penalty in row/col > extraction. > > Here's my > > > minimum reproducible example, which I hope > explains itself. > > > > > > > > > A = rand(0.:0.01:1.,1000,1000) > > > > > > > > > function test(x) > > > for i in 1:1000, j in 1:1000 > > > x[:,i] > > > x[:,j] > > > end > > > end > > > > > > > > > test(A) # warm up > > > gc() > > > @time test(A) > > > ## elapsed time: 13.28547939 seconds (16208000080 > bytes > > allocated, > > > 72.42% gc time) > > > > > > It takes 13 seconds, only extracting the > rows/cols for the > > sake of > > > further calculations. I'm wondering if anything I > could do > > to improve > > > the performance.Thanks in advance. > > This is because extracting a row/column creates a > copy. > > Depending on > > what calculation you want to do on them, you can: > > - use arrays views (which will become the default > when > > extracting slices > > in 0.4): https://github.com/JuliaLang/ArrayViews.jl > > - manually write loops to go over the row and column > so that > > you only > > extract one individual element of the matrix at a > time > > > > > > Regards >
