Just FYI, "Linda Hua" is actually "Dahua Lin". :-) Cheers, Kevin
On Monday, December 1, 2014, Todd Leo <[email protected]> wrote: > Big thanks for your tips! > > (Though this will compute correlations between columns, not rows. And >> the distance is 1 - correlation.) >> > Since matrix transpose is easy to obtain, there's no obstacle to adapt to > CorrDist() . > > I've also tried ArrayViews on my previous applications, except ArrayViews > does not support sparse matrices. Will check on Linda Hua's source code and > try to implement views on sparse matrices soon. > > -- > REGARDS, > Todd Leo > > On Monday, December 1, 2014 8:55:15 PM UTC+8, Milan Bouchet-Valat wrote: >> >> Le lundi 01 décembre 2014 à 02:50 +0000, SLiZn Liu a écrit : >> > I have a n by m dense matrix, and each row is a vector >> > representing variating flows like stock price, and I'd like to find >> > out the two vectors which have the highest similarity using cor(). >> > Hence, a nested for-loop was utilized to calculate the similarity >> > between each pair, and fill the similarity into an n by n adjacency >> > matrix. >> In that case you can simply use the Distances.jl package like this: >> pairwise(CorrDist(), x) >> >> (Though this will compute correlations between columns, not rows. And >> the distance is 1 - correlation.) >> >> >> If you look at the code it uses by calling >> edit(pairwise!) >> >> you'll see that it relies on array views to avoid creating copies of the >> columns. >> >> >> Regards >> >> >> > On Fri Nov 28 2014 at 8:49:51 PM Milan Bouchet-Valat >> > <[email protected]> wrote: >> > Le vendredi 28 novembre 2014 à 10:21 +0000, SLiZn Liu a >> > écrit : >> > > I'm doing row-wise/col-wise calculation, isn't it inevitable >> > to create >> > > row/col copies after iteratively extract single elements? >> > No, I don't think so, though sometimes you'll want to extract >> > a full >> > row/column to pass it to a standard function instead of >> > writing all of >> > the computations by hand. That's where array views are very >> > useful. >> > >> > But can you give more details about the calculation you need >> > to do? >> > >> > >> > Regards >> > >> > > I will consider to take a shot on option 1, ArrayViews if >> > this >> > > single-element-extraction comes to a dead end. Thanks, >> > Milan! >> > > >> > > >> > > >> > > On Fri Nov 28 2014 at 6:00:07 PM Milan Bouchet-Valat >> > > <[email protected]> wrote: >> > > Le vendredi 28 novembre 2014 à 01:45 -0800, Todd Leo >> > a écrit : >> > > > Hi Fellows, >> > > > >> > > > >> > > > Say I have a 1000 x 1000 matrix, and I'm going to >> > do some >> > > calculation >> > > > in a nested for-loop, with each pair of rows/cols >> > in the >> > > matrix. But I >> > > > suffered a heavy performance penalty in row/col >> > extraction. >> > > Here's my >> > > > minimum reproducible example, which I hope >> > explains itself. >> > > > >> > > > >> > > > A = rand(0.:0.01:1.,1000,1000) >> > > > >> > > > >> > > > function test(x) >> > > > for i in 1:1000, j in 1:1000 >> > > > x[:,i] >> > > > x[:,j] >> > > > end >> > > > end >> > > > >> > > > >> > > > test(A) # warm up >> > > > gc() >> > > > @time test(A) >> > > > ## elapsed time: 13.28547939 seconds (16208000080 >> > bytes >> > > allocated, >> > > > 72.42% gc time) >> > > > >> > > > It takes 13 seconds, only extracting the >> > rows/cols for the >> > > sake of >> > > > further calculations. I'm wondering if anything I >> > could do >> > > to improve >> > > > the performance.Thanks in advance. >> > > This is because extracting a row/column creates a >> > copy. >> > > Depending on >> > > what calculation you want to do on them, you can: >> > > - use arrays views (which will become the default >> > when >> > > extracting slices >> > > in 0.4): https://github.com/JuliaLang/ArrayViews.jl >> > > - manually write loops to go over the row and column >> > so that >> > > you only >> > > extract one individual element of the matrix at a >> > time >> > > >> > > >> > > Regards >> > >> >>
