I'm trying to understand parallelization in Julia, but as a former MATLAB 
user who is used to just sprinkling some random "parfor"'s around my code 
I'm not having too much success.
Here's my test program, that runs in a similar way to the actual program I 
want to parallelize. The basic feature is that I have a number of variables 
stored in arrays and I want to call a function using each possible 
combination of these variables. The function itself consists of an 
integration and a minimization, and takes around 0.005 seconds to compute.

When running the program below, two things stand out:
(i) The results matrices of all parallelized calculations are empty. I seem 
do be doing something fundamentally wrong in the way I'm structuring this.
(ii) The further I move up along the chain of for loops, the shorter my 
calculation time; when I parallelize the outer-most loop, the calculation 
time drops to 0.02 seconds, which seems to indicate that most computations 
are simply not performed. 

As you can tell, I don't really know what I'm doing here so any help would 
be greatly appreciated!

Test program:

addprocs(3)

@everywhere using Distributions
@everywhere using QuantEcon

@everywhere function f{T<:Float64}(x1::T, x2::T, x3::T)
    
    distr = LogNormal(x1+2, x2+2)
    
    function f2(x, x1=x1, x2=x2, x3=x3)
        (x*x1 + x*x2 - x*x3).*pdf(distr, x)
    end
    
    quadrect(f2, 500, x1, x2+2)
end

X1 = rand(10, 1)
X2 = rand(10, 1)
X3 = rand(10, 1)
Results = zeros(10, 10, 10)
ResultsInner = zeros(10, 10, 10)
ResultsMiddle = zeros(10, 10, 10)
ResultsOuter = zeros(10, 10, 10)

tic()
for i = 1:10
    x1 = X1[i]
    for j = 1:10
        x2 = X2[j]
        for k = 1:10
            x3 = X3[k]
            Results[i, j, k] = f(x1, x2, x3)
        end
    end
end
@printf "The one-core loop takes %.2f seconds\n" toq()

tic()
for i = 1:10
    x1 = X1[i]
    for j = 1:10
        x2 = X2[j]
        @parallel for k = 1:10
            x3 = X3[k]
            ResultsInner[i, j, k] = f(x1, x2, x3)
        end
    end
end
@printf "The multi-core loop (inner) takes %.2f seconds\n" toq()

tic()
for i = 1:10
    x1 = X1[i]
    @parallel for j = 1:10
        x2 = X2[j]
        for k = 1:10
            x3 = X3[k]
            ResultsMiddle[i, j, k] = f(x1, x2, x3)
        end
    end
end
@printf "The multi-core loop (middle) takes %.2f seconds\n" toq()

tic()
@parallel for i = 1:10
    x1 = X1[i]
    for j = 1:10
        x2 = X2[j]
        for k = 1:10
            x3 = X3[k]
            ResultsOuter[i, j, k] = f(x1, x2, x3)
        end
    end
end
@printf "The multi-core loop (outer) takes %.2f seconds\n" toq()

Reply via email to