Finally, I found that Octave has an equivalent to sumabs2() called sumsq(). Just for sake of completeness here are the timings:
Octave X = rand(7000); tic; sumsq(X); toc; Elapsed time is 0.0616651 seconds. Julia v0.3 @time X = rand(7000,7000); elapsed time: 0.285218597 seconds (392000160 bytes allocated) @time sumabs2(X, 1); elapsed time: 0.05705666 seconds (56496 bytes allocated) Essentially speed is about the same with Julia being a little faster. It was however interesting to observe that @time X = rand(7000,7000); is about 2.5 times slower in Julia 0.3 than it was in Julia 0.2 ... in Julia (v0.2.1): @time X = rand(7000,7000); elapsed time: 0.114418731 seconds (392000128 bytes allocated) Jan Dňa utorok, 9. septembra 2014 17:06:59 UTC+2 Ján Dolinský napísal(-a): > > Hello Andreas, > > Thanks for the tip. I'll check it out. Thumbs up for the 0.4! > > Jan > > On 09.09.2014 17:04, Andreas Noack wrote: > > If you need the speed now you can try one of the package ArrayViews or > ArrayViewsAPL. It is something similar to the functionality in these > packages that we are trying to include in base. > > Med venlig hilsen > > Andreas Noack > > 2014-09-09 9:38 GMT-04:00 Ján Dolinský: > >> OK, so basically there is nothing wrong with the syntax X[:,1001:end] ? >> >> d = sumabs2(X[:,1001:end], 1); >> and I should just wait until v0.4 is available (perhaps available soon >> in Julia Nightlies PPA). >> >> I did the benchmark with the floating point power function based on >> Simon's comment. Here are my results (after couple of repetitive >> iterations): >> @time X.^2; >> elapsed time: 0.511988142 seconds (392000256 bytes allocated, 2.52% gc >> time) >> @time X.^2.0; >> elapsed time: 0.411791612 seconds (392000256 bytes allocated, 3.12% gc >> time) >> >> Thanks, >> Jan Dolinsky >> >> On 09.09.2014 14:06, Andreas Noack wrote: >> >> The problem is that right now X[:,1001,end] makes a copy of the array. >> However, in 0.4 this will instead be a view of the original matrix and >> therefore the computing time should be almost the same. >> >> It might also be worth repeating Simon's comment that the floating >> point power function has special handling of 2. The result is that >> >> julia> @time A.^2; >> elapsed time: 1.402791357 seconds (200000256 bytes allocated, 5.90% gc >> time) >> >> julia> @time A.^2.0; >> elapsed time: 0.554241105 seconds (200000256 bytes allocated, 15.04% gc >> time) >> >> I tend to agree with Simon that special casing of integer 2 would be >> reasonable. >> >> Med venlig hilsen >> >> Andreas Noack >> >> 2014-09-09 4:24 GMT-04:00 Ján Dolinský: >> >>> Hello guys, >>> >>> Thanks a lot for the lengthy discussions. It helped me a lot to get a >>> feeling on what is Julia like. I did some more performance comparisons as >>> suggested by first two posts (thanks a lot for the tips). In the mean time >>> I upgraded to v0.3. >>> X = rand(7000,7000); >>> @time d = sum(X.^2, 1); >>> elapsed time: 0.573125833 seconds (392056672 bytes allocated, 2.25% gc >>> time) >>> @time d = sum(X.*X, 1); >>> elapsed time: 0.178715901 seconds (392057080 bytes allocated, 14.06% gc >>> time) >>> @time d = sumabs2(X, 1); >>> elapsed time: 0.067431808 seconds (56496 bytes allocated) >>> >>> In Octave then >>> X = rand(7000); >>> tic; d = sum(X.^2); toc; >>> Elapsed time is 0.167578 seconds. >>> >>> So the ultimate solution is the sumabs2 function which is a blast. I am >>> comming from Matlab/Octave and I would expect X.^2 to be fast "out of the >>> box" but nevertheless if I can get an excellent performance by learning >>> some new paradigms I will go for it. >>> >>> The above tests lead me to another question. I often need to calculate >>> the "self" dot product over a portion of a matrix, e.g. >>> @time d = sumabs2(X[:,1001:end], 1); >>> elapsed time: 0.175333366 seconds (336048688 bytes allocated, 7.01% gc >>> time) >>> >>> Apparently this is not a way to do it in Julia because working on a >>> smaller matrix of 7000x6000 gives more than double computing time and >>> furthermore it seems to allocate unnecessary memory. >>> >>> Best Regards, >>> Jan >>> >>> >>> >>> Dňa pondelok, 8. septembra 2014 10:36:02 UTC+2 Ján Dolinský napísal(-a): >>> >>>> Hello, >>>> >>>> I am a new Julia user. I am trying to write a function for computing >>>> "self" dot product of all columns in a matrix, i.e. calculating a square >>>> of >>>> each element of a matrix and computing a column-wise sum. I am interested >>>> in a proper way of doing it because I often need to process large matrices. >>>> >>>> I first put a focus on calculating the squares. For testing purposes I >>>> use a matrix of random floats of size 7000x7000. All timings here are >>>> deducted after several repetitive runs. >>>> >>>> I used to do it in Octave (v3.8.1) a follows: >>>> tic; X = rand(7000); toc; >>>> Elapsed time is 0.579093 seconds. >>>> tic; XX = X.^2; toc; >>>> Elapsed time is 0.114737 seconds. >>>> >>>> >>>> I tried to to the same in Julia (v0.2.1): >>>> @time X = rand(7000,7000); >>>> elapsed time: 0.114418731 seconds (392000128 bytes allocated) >>>> @time XX = X.^2; >>>> elapsed time: 0.369641268 seconds (392000224 bytes allocated) >>>> >>>> I was surprised to see that Julia is about 3 times slower when >>>> calculating a square than my original routine in Octave. I then read >>>> "Performance tips" and found out that one should use * instead of of >>>> raising to small integer powers, for example x*x*x instead of x^3. I >>>> therefore tested the following. >>>> @time XX = X.*X; >>>> elapsed time: 0.146059577 seconds (392000968 bytes allocated) >>>> >>>> This approach indeed resulted in a lot shorter computing time. It is >>>> still however a little slower than my code in Octave. Can someone advise >>>> on >>>> any performance tips ? >>>> >>>> I then finally do a sum over all columns of XX to get the "self" dot >>>> product but first I'd like to fix the squaring part. >>>> >>>> Thanks a lot. >>>> Best Regards, >>>> Jan >>>> >>>> p.s. In Julia manual I found a while ago an example of using @vectorize >>>> macro with a squaring function but can not find it any more. Perhaps the >>>> name of macro was different ... >>>> >>>> >>> >> >> > >
