The problem is that right now X[:,1001,end] makes a copy of the array. However, in 0.4 this will instead be a view of the original matrix and therefore the computing time should be almost the same.
It might also be worth repeating Simon's comment that the floating point power function has special handling of 2. The result is that julia> @time A.^2; elapsed time: 1.402791357 seconds (200000256 bytes allocated, 5.90% gc time) julia> @time A.^2.0; elapsed time: 0.554241105 seconds (200000256 bytes allocated, 15.04% gc time) I tend to agree with Simon that special casing of integer 2 would be reasonable. Med venlig hilsen Andreas Noack 2014-09-09 4:24 GMT-04:00 Ján Dolinský <[email protected]>: > Hello guys, > > Thanks a lot for the lengthy discussions. It helped me a lot to get a > feeling on what is Julia like. I did some more performance comparisons as > suggested by first two posts (thanks a lot for the tips). In the mean time > I upgraded to v0.3. > X = rand(7000,7000); > @time d = sum(X.^2, 1); > elapsed time: 0.573125833 seconds (392056672 bytes allocated, 2.25% gc > time) > @time d = sum(X.*X, 1); > elapsed time: 0.178715901 seconds (392057080 bytes allocated, 14.06% gc > time) > @time d = sumabs2(X, 1); > elapsed time: 0.067431808 seconds (56496 bytes allocated) > > In Octave then > X = rand(7000); > tic; d = sum(X.^2); toc; > Elapsed time is 0.167578 seconds. > > So the ultimate solution is the sumabs2 function which is a blast. I am > comming from Matlab/Octave and I would expect X.^2 to be fast "out of the > box" but nevertheless if I can get an excellent performance by learning > some new paradigms I will go for it. > > The above tests lead me to another question. I often need to calculate the > "self" dot product over a portion of a matrix, e.g. > @time d = sumabs2(X[:,1001:end], 1); > elapsed time: 0.175333366 seconds (336048688 bytes allocated, 7.01% gc > time) > > Apparently this is not a way to do it in Julia because working on a > smaller matrix of 7000x6000 gives more than double computing time and > furthermore it seems to allocate unnecessary memory. > > Best Regards, > Jan > > > > Dňa pondelok, 8. septembra 2014 10:36:02 UTC+2 Ján Dolinský napísal(-a): > >> Hello, >> >> I am a new Julia user. I am trying to write a function for computing >> "self" dot product of all columns in a matrix, i.e. calculating a square of >> each element of a matrix and computing a column-wise sum. I am interested >> in a proper way of doing it because I often need to process large matrices. >> >> I first put a focus on calculating the squares. For testing purposes I >> use a matrix of random floats of size 7000x7000. All timings here are >> deducted after several repetitive runs. >> >> I used to do it in Octave (v3.8.1) a follows: >> tic; X = rand(7000); toc; >> Elapsed time is 0.579093 seconds. >> tic; XX = X.^2; toc; >> Elapsed time is 0.114737 seconds. >> >> >> I tried to to the same in Julia (v0.2.1): >> @time X = rand(7000,7000); >> elapsed time: 0.114418731 seconds (392000128 bytes allocated) >> @time XX = X.^2; >> elapsed time: 0.369641268 seconds (392000224 bytes allocated) >> >> I was surprised to see that Julia is about 3 times slower when >> calculating a square than my original routine in Octave. I then read >> "Performance tips" and found out that one should use * instead of of >> raising to small integer powers, for example x*x*x instead of x^3. I >> therefore tested the following. >> @time XX = X.*X; >> elapsed time: 0.146059577 seconds (392000968 bytes allocated) >> >> This approach indeed resulted in a lot shorter computing time. It is >> still however a little slower than my code in Octave. Can someone advise on >> any performance tips ? >> >> I then finally do a sum over all columns of XX to get the "self" dot >> product but first I'd like to fix the squaring part. >> >> Thanks a lot. >> Best Regards, >> Jan >> >> p.s. In Julia manual I found a while ago an example of using @vectorize >> macro with a squaring function but can not find it any more. Perhaps the >> name of macro was different ... >> >> >
