Just FYI, "Linda Hua" is actually "Dahua Lin".  :-)

Cheers,
   Kevin

On Monday, December 1, 2014, Todd Leo <[email protected]> wrote:

> Big thanks for your tips!
>
> (Though this will compute correlations between columns, not rows. And
>> the distance is 1 - correlation.)
>>
>  Since matrix transpose is easy to obtain, there's no obstacle to adapt to
> CorrDist() .
>
> I've also tried ArrayViews on my previous applications, except ArrayViews
> does not support sparse matrices. Will check on Linda Hua's source code and
> try to implement views on sparse matrices soon.
>
> --
> REGARDS,
> Todd Leo
>
> On Monday, December 1, 2014 8:55:15 PM UTC+8, Milan Bouchet-Valat wrote:
>>
>> Le lundi 01 décembre 2014 à 02:50 +0000, SLiZn Liu a écrit :
>> > I have a n by m dense matrix, and each row is a vector
>> > representing variating flows like stock price, and I'd like to find
>> > out the two vectors which have the highest similarity using cor().
>> > Hence, a nested for-loop was utilized to calculate the similarity
>> > between each pair, and fill the similarity into an n by n adjacency
>> > matrix.
>> In that case you can simply use the Distances.jl package like this:
>> pairwise(CorrDist(), x)
>>
>> (Though this will compute correlations between columns, not rows. And
>> the distance is 1 - correlation.)
>>
>>
>> If you look at the code it uses by calling
>> edit(pairwise!)
>>
>> you'll see that it relies on array views to avoid creating copies of the
>> columns.
>>
>>
>> Regards
>>
>>
>> > On Fri Nov 28 2014 at 8:49:51 PM Milan Bouchet-Valat
>> > <[email protected]> wrote:
>> >         Le vendredi 28 novembre 2014 à 10:21 +0000, SLiZn Liu a
>> >         écrit :
>> >         > I'm doing row-wise/col-wise calculation, isn't it inevitable
>> >         to create
>> >         > row/col copies after iteratively extract single elements?
>> >         No, I don't think so, though sometimes you'll want to extract
>> >         a full
>> >         row/column to pass it to a standard function instead of
>> >         writing all of
>> >         the computations by hand. That's where array views are very
>> >         useful.
>> >
>> >         But can you give more details about the calculation you need
>> >         to do?
>> >
>> >
>> >         Regards
>> >
>> >         > I will consider to take a shot on option 1, ArrayViews if
>> >         this
>> >         > single-element-extraction comes to a dead end. Thanks,
>> >         Milan!
>> >         >
>> >         >
>> >         >
>> >         > On Fri Nov 28 2014 at 6:00:07 PM Milan Bouchet-Valat
>> >         > <[email protected]> wrote:
>> >         >         Le vendredi 28 novembre 2014 à 01:45 -0800, Todd Leo
>> >         a écrit :
>> >         >         > Hi Fellows,
>> >         >         >
>> >         >         >
>> >         >         > Say I have a 1000 x 1000 matrix, and I'm going to
>> >         do some
>> >         >         calculation
>> >         >         > in a nested for-loop, with each pair of rows/cols
>> >         in the
>> >         >         matrix. But I
>> >         >         > suffered a heavy performance penalty in row/col
>> >         extraction.
>> >         >         Here's my
>> >         >         > minimum reproducible example, which I hope
>> >         explains itself.
>> >         >         >
>> >         >         >
>> >         >         > A = rand(0.:0.01:1.,1000,1000)
>> >         >         >
>> >         >         >
>> >         >         > function test(x)
>> >         >         >     for i in 1:1000, j in 1:1000
>> >         >         >         x[:,i]
>> >         >         >         x[:,j]
>> >         >         >     end
>> >         >         > end
>> >         >         >
>> >         >         >
>> >         >         > test(A) # warm up
>> >         >         > gc()
>> >         >         > @time test(A)
>> >         >         > ## elapsed time: 13.28547939 seconds (16208000080
>> >         bytes
>> >         >         allocated,
>> >         >         > 72.42% gc time)
>> >         >         >
>> >         >         >  It takes 13 seconds, only extracting the
>> >         rows/cols for the
>> >         >         sake of
>> >         >         > further calculations. I'm wondering if anything I
>> >         could do
>> >         >         to improve
>> >         >         > the performance.Thanks in advance.
>> >         >         This is because extracting a row/column creates a
>> >         copy.
>> >         >         Depending on
>> >         >         what calculation you want to do on them, you can:
>> >         >         - use arrays views (which will become the default
>> >         when
>> >         >         extracting slices
>> >         >         in 0.4): https://github.com/JuliaLang/ArrayViews.jl
>> >         >         - manually write loops to go over the row and column
>> >         so that
>> >         >         you only
>> >         >         extract one individual element of the matrix at a
>> >         time
>> >         >
>> >         >
>> >         >         Regards
>> >
>>
>>

Reply via email to