I'm trying to understand parallelization in Julia, but as a former MATLAB
user who is used to just sprinkling some random "parfor"'s around my code
I'm not having too much success.
Here's my test program, that runs in a similar way to the actual program I
want to parallelize. The basic feature is that I have a number of variables
stored in arrays and I want to call a function using each possible
combination of these variables. The function itself consists of an
integration and a minimization, and takes around 0.005 seconds to compute.
When running the program below, two things stand out:
(i) The results matrices of all parallelized calculations are empty. I seem
do be doing something fundamentally wrong in the way I'm structuring this.
(ii) The further I move up along the chain of for loops, the shorter my
calculation time; when I parallelize the outer-most loop, the calculation
time drops to 0.02 seconds, which seems to indicate that most computations
are simply not performed.
As you can tell, I don't really know what I'm doing here so any help would
be greatly appreciated!
Test program:
addprocs(3)
@everywhere using Distributions
@everywhere using QuantEcon
@everywhere function f{T<:Float64}(x1::T, x2::T, x3::T)
distr = LogNormal(x1+2, x2+2)
function f2(x, x1=x1, x2=x2, x3=x3)
(x*x1 + x*x2 - x*x3).*pdf(distr, x)
end
quadrect(f2, 500, x1, x2+2)
end
X1 = rand(10, 1)
X2 = rand(10, 1)
X3 = rand(10, 1)
Results = zeros(10, 10, 10)
ResultsInner = zeros(10, 10, 10)
ResultsMiddle = zeros(10, 10, 10)
ResultsOuter = zeros(10, 10, 10)
tic()
for i = 1:10
x1 = X1[i]
for j = 1:10
x2 = X2[j]
for k = 1:10
x3 = X3[k]
Results[i, j, k] = f(x1, x2, x3)
end
end
end
@printf "The one-core loop takes %.2f seconds\n" toq()
tic()
for i = 1:10
x1 = X1[i]
for j = 1:10
x2 = X2[j]
@parallel for k = 1:10
x3 = X3[k]
ResultsInner[i, j, k] = f(x1, x2, x3)
end
end
end
@printf "The multi-core loop (inner) takes %.2f seconds\n" toq()
tic()
for i = 1:10
x1 = X1[i]
@parallel for j = 1:10
x2 = X2[j]
for k = 1:10
x3 = X3[k]
ResultsMiddle[i, j, k] = f(x1, x2, x3)
end
end
end
@printf "The multi-core loop (middle) takes %.2f seconds\n" toq()
tic()
@parallel for i = 1:10
x1 = X1[i]
for j = 1:10
x2 = X2[j]
for k = 1:10
x3 = X3[k]
ResultsOuter[i, j, k] = f(x1, x2, x3)
end
end
end
@printf "The multi-core loop (outer) takes %.2f seconds\n" toq()