In general, if you want the best performance
(1) avoid temporary creation
(2) traverse arrays in storage order

Beyond that, any tricks that reduce the operation count can be helpful.

For example, on my machine the following version is 5-6 times faster than your 
version:

function my_make_Ds!(da, Ds)
    for k = 1:size(da,2)
        for j = 1:size(Ds,1)
            tmp = da[j,k]
            Ds[j,j,k] = 0
            for i = j+1:size(Ds,1)
                Ds[j,i,k] = Ds[i,j,k] = (da[i,k]-tmp)^2
            end
        end
    end
end

You could juice a little more out by using @inbounds, but you'd first want to 
check that the sizes make sense and throw an error if not.

I didn't try the second function, but presumably similar tricks should help 
there too.

You may not want to devectorize everything, but using the profiler you can 
identify which bits of your code actually matter, and then work hard on 
optimizing those. The Julia equivalent of bsxfun(@minus, a, b) is just a.-b. 
There's also a nifty

    sum!(result, input, region)

function for accumulating sums across a dimension of the array.

--Tim

On Tuesday, May 13, 2014 03:36:52 PM Thomas Covert wrote:
> I'm trying to wean myself off MATLAB and am trying to figure out how to do
> things the julian way by porting some code that fits a Gaussian process.
>  Something my code does over and over again while fitting the GP is compute
> the value of a covariance function - more or less a matrix of pairwise
> distances passed through a kernel function.
> 
> I'd love for my julia code to run as fast (or faster) than what I'm
> currently attaining in MATLAB, but right now this isn't happening.  below
> I've attached julia and MATLAB code to do a simplified version of the task:
> (1) compute 3 matrices of the pairwise distances between all points in a
> vector for 3 vectors - call these M1, M2 and M3 (2) compute exp(theta0 - .5
> * (M1*exp(-2*theta1) + M2*exp(-2*theta2) + M3*exp(-2*theta3))).  The 2nd
> step is the one that needs to be really fast, since I'm ultimately trying
> to estimate the thetas, so the second step gets called many times.
> 
> My MATLAB code is certainly not optimized, but I've been fiddling with the
> Julia code for a while, trying devectorization tricks all day.  On my
> machine (mid-2009 macbook pro), the MATLAB code (including memory
> allocation, etc) runs in about 1.2 seconds, with the key part (converting
> the distance matrices into a covariance function evaluation) taking about
> .5 seconds.  The julia code is slower: just the function evaluation of the
> second piece takes 1.2 seconds.
> 
> Julia experts, what am I doing wrong?  I assume it has something to do with
> being smarter about the order of operations across my multi-dimensional
> array, but I'm not sure exactly how.  Note, the big dimension of this
> problem (2699) is what I actually face in practice, so it seemed relevant
> to test it here.
> 
> Thanks in advance for any advice!
> 
> MATLAB CODE
> % initialization
> tic
> da = randn(2699,3);
> Ds = zeros(size(da,1),size(da,1),size(da,2));
> toc
> 
> % equivalent code for make_Ds!
> tic
> for i = 1:size(da,2)
>     Ds(:,:,i) = bsxfun(@minus,da(:,i),da(:,i)') .^ 2;
> end
> toc
> 
> % more initialization
> tic
> K = zeros(size(da,1),size(da,1));
> theta = ones(5,1);
> toc
> 
> % equivalent code for make_K!
> tic
> for i = 1:size(Ds,3)
>     K = K + Ds(:,:,i) * exp(-2 * theta(i + 1));
> end
> K = exp(-2 * theta(1) - .5 * K);
> toc
> 
> JULIA CODE
> using Distance
> 
> ## FUNCTIONS ##
> # this function should just fill out a 3-dimensional array of pairwise
> # squared distances
> function make_Ds!(da::Array{Float64},Ds::Array{Float64})
>     for i = 1:size(da,2)
>         Ds[:,:,i] = pairwise(SqEuclidean(),da[:,i]');
>     end
> end
> 
> # this function adds up these squared differences, weights them by
> appropriate
> # length scales, and passes them through a normal kernel
> function make_K!(theta::Array{Float64},Ds::Array{Float64},K::Array{Float64})
> for i = 1:size(Ds,3)
>         K += Ds[:,:,i] * exp(-2 * theta[i+1]);
>     end
>     K = exp(-2 * theta[1] .- .5 * K);
> end
> 
> ## CODE ##
> # this code generates 3 std normal vectors, computes the pairwise distances
> # between the points, and turns these pairwise distances into a covariance
> # function value
> 
> da = randn(2699,3);
> 
> Ds = zeros(size(da,1),size(da,1),size(da,2));
> make_Ds!(da,Ds);
> 
> K = zeros(Float64,size(da,1),size(da,1));
> make_K!(ones(5,1),Ds,K);
> 
> # run things through a second time to avoid accounting for compiler time
> tic(); make_Ds!(da,Ds); toc()
> K = zeros(Float64,size(da,1),size(da,1));
> tic(); make_K!(ones(5,1),Ds,K); toc()

Reply via email to