That's a very satisfying result :-). --Tim
On Thursday, March 12, 2015 07:19:01 PM Milan Bouchet-Valat wrote: > Le jeudi 12 mars 2015 à 11:01 -0500, Tim Holy a écrit : > > This is something that many people (understandably) have a hard time > > appreciating, so I think this post should be framed and put up on the > > julia > > wall. > > > > We go to considerable lengths to try to make code work efficiently in the > > general case (check out subarray.jl and subarray2.jl in master some > > time...), but sometimes there's no competing with a hand-rolled version > > for a particular case. Folks should not be shy to implement such tricks > > in their own code. > Though with the new array views in 0.4, the vectorized version should be > more efficient than in 0.3. I've tried it, and indeed it looks like > unrolling is not really needed, though it's still faster and uses less > RAM: > > X = rand(100_000, 5) > > function f1(X, i, j) > for _ in 1:1000 > X[:, i], X[:, j] = X[:, j], X[:, i] > end > end > > function f2(X, i, j) > for _ in 1:1000 > a = sub(X, :, i) > b = sub(X, :, j) > a[:], b[:] = b, a > end > end > > function f3(X, i, j) > for _ in 1:1000 > @inbounds for k in 1:size(X, 1) > X[k, i], X[k, j] = X[k, j], X[k, i] > end > end > end > > > julia> f1(X, 1, 5); f2(X, 1, 5); f3(X, 1, 5); > > julia> @time f1(X, 1, 5) > elapsed time: 1.027090951 seconds (1526 MB allocated, 3.63% gc time in > 69 pauses with 0 full sweep) > > julia> @time f2(X, 1, 5) > elapsed time: 0.172375013 seconds (390 kB allocated) > > julia> @time f3(X, 1, 5) > elapsed time: 0.155069259 seconds (80 bytes allocated) > > > Regards > > > --Tim > > > > On Thursday, March 12, 2015 07:49:49 AM Steven G. Johnson wrote: > > > As a general rule, with Julia one needs to unlearn the instinct (from > > > Matlab or Python) that "efficiency == clever use of library functions", > > > which turns all optimization questions into "is there a built-in > > > function > > > for X" (and if the answer is "no" you are out of luck). Loops are > > > fast, > > > and you can easily beat general-purpose library functions with your own > > > special-purpose code.
