That's a very satisfying result :-).

--Tim

On Thursday, March 12, 2015 07:19:01 PM Milan Bouchet-Valat wrote:
> Le jeudi 12 mars 2015 à 11:01 -0500, Tim Holy a écrit :
> > This is something that many people (understandably) have a hard time
> > appreciating, so I think this post should be framed and put up on the
> > julia
> > wall.
> > 
> > We go to considerable lengths to try to make code work efficiently in the
> > general case (check out subarray.jl and subarray2.jl in master some
> > time...), but sometimes there's no competing with a hand-rolled version
> > for a particular case. Folks should not be shy to implement such tricks
> > in their own code.
> Though with the new array views in 0.4, the vectorized version should be
> more efficient than in 0.3. I've tried it, and indeed it looks like
> unrolling is not really needed, though it's still faster and uses less
> RAM:
> 
> X = rand(100_000, 5)
> 
> function f1(X, i, j)
>     for _ in 1:1000
>         X[:, i], X[:, j] = X[:, j], X[:, i]
>     end
> end
> 
> function f2(X, i, j)
>     for _ in 1:1000
>         a = sub(X, :, i)
>         b = sub(X, :, j)
>         a[:], b[:] = b, a
>     end
> end
> 
> function f3(X, i, j)
>     for _ in 1:1000
>         @inbounds for k in 1:size(X, 1)
>             X[k, i], X[k, j] = X[k, j], X[k, i]
>         end
>     end
> end
> 
> 
> julia> f1(X, 1, 5); f2(X, 1, 5); f3(X, 1, 5);
> 
> julia> @time f1(X, 1, 5)
> elapsed time: 1.027090951 seconds (1526 MB allocated, 3.63% gc time in
> 69 pauses with 0 full sweep)
> 
> julia> @time f2(X, 1, 5)
> elapsed time: 0.172375013 seconds (390 kB allocated)
> 
> julia> @time f3(X, 1, 5)
> elapsed time: 0.155069259 seconds (80 bytes allocated)
> 
> 
> Regards
> 
> > --Tim
> > 
> > On Thursday, March 12, 2015 07:49:49 AM Steven G. Johnson wrote:
> > > As a general rule, with Julia one needs to unlearn the instinct (from
> > > Matlab or Python) that "efficiency == clever use of library functions",
> > > which turns all optimization questions into "is there a built-in
> > > function
> > > for X" (and if the answer is "no" you are out of luck).   Loops are
> > > fast,
> > > and you can easily beat general-purpose library functions with your own
> > > special-purpose code.

Reply via email to