I recommend you read through this blog post - http://julialang.org/blog/2013/09/fast-numeric/ And maybe http://www.johnmyleswhite.com/notebook/2013/12/22/the-relationship-between-vectorized-and-devectorized-code/ too while you're at it, just replace "R" with "Octave" or "Matlab" and the conclusion is basically the same.
The macro you were looking for was probably @devec from the Devectorize.jl package. If you don't actually need the intermediate matrix full of the 49 million squared values, it's a huge waste of resources to allocate memory to store them all. If you upgrade to Julia 0.3.0, there's a function sumabs2(x, dim) that computes sum(abs(x).^2, dim) without allocating intermediate temporary arrays for abs(x).^2. On my laptop sumabs2(X, 1) is about 6 times faster and allocates hardly any memory at all relative to sum(X.*X, 1) which needs to allocate almost 400 MB. On Monday, September 8, 2014 1:36:02 AM UTC-7, Ján Dolinský wrote: > > Hello, > > I am a new Julia user. I am trying to write a function for computing > "self" dot product of all columns in a matrix, i.e. calculating a square of > each element of a matrix and computing a column-wise sum. I am interested > in a proper way of doing it because I often need to process large matrices. > > I first put a focus on calculating the squares. For testing purposes I use > a matrix of random floats of size 7000x7000. All timings here are deducted > after several repetitive runs. > > I used to do it in Octave (v3.8.1) a follows: > tic; X = rand(7000); toc; > Elapsed time is 0.579093 seconds. > tic; XX = X.^2; toc; > Elapsed time is 0.114737 seconds. > > > I tried to to the same in Julia (v0.2.1): > @time X = rand(7000,7000); > elapsed time: 0.114418731 seconds (392000128 bytes allocated) > @time XX = X.^2; > elapsed time: 0.369641268 seconds (392000224 bytes allocated) > > I was surprised to see that Julia is about 3 times slower when calculating > a square than my original routine in Octave. I then read "Performance tips" > and found out that one should use * instead of of raising to small integer > powers, for example x*x*x instead of x^3. I therefore tested the > following. > @time XX = X.*X; > elapsed time: 0.146059577 seconds (392000968 bytes allocated) > > This approach indeed resulted in a lot shorter computing time. It is still > however a little slower than my code in Octave. Can someone advise on any > performance tips ? > > I then finally do a sum over all columns of XX to get the "self" dot > product but first I'd like to fix the squaring part. > > Thanks a lot. > Best Regards, > Jan > > p.s. In Julia manual I found a while ago an example of using @vectorize > macro with a squaring function but can not find it any more. Perhaps the > name of macro was different ... > >
